This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 


Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 


BLACK BORDERS 

TEXT CUT OFF AT TOP, BOTTOM OR SIDES 
FADED TEXT 
ILLEGIBLE TEXT 
SKEWED/SLANTED IMAGES 
COLORED PHOTOS 

BLACK OR VERY BLACK AND WHITE DARK PHOTOS 
GRAY SCALE DOCUMENTS 


IMAGES ARE BEST AVAILABLE COPY. 


As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 


per 


WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 



INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 


(51) International Patent Classification 5 : 
C12N 15/00, 15/38, C07K 13/00 


Al 


(11) International Publication Number: 
(43) International Publication Date: 


WO 92/03547 

5 March 1992 (05.03.92) 


(21) International Application Number: PCT/US91/05870 

(22) International Filing Date: 19 August 1991 (19.08.91) 


(30) Priority data: 
572,711 
736,335 


24 August 1990 (24.08.90) US 

25 July 1991 (25.07.91) US 


(71) Applicant: MICHIGAN STATE UNIVERSITY [US/US]; 

East Lansing, MI 48824 (US). 

(72) Inventors: VELICER, Leland, F. ; 2678 Blue Haven Ct., 

East Lansing, MI 48823 (US). BRUNOVSKIS, Peter ; 
1528 B. Spartan Village, East Lansing, MI 48823 (US). 
COUSSENS, Paul, M. ; 7180 Cutler Road, DeWitt, MI 
48820 (US). 


(74) Agent: McLEOD, Ian, C; 2190 Commons Parkway, Oke- 
mos, MI 48864 (US). 


(81) Designated States: AT (European patent), AU, BE (Euro- 
pean patent), CA, CH (European patent), DE (Euro- 
pean patent), DK (European patent), ES (European pa- 
tent), FR (European patent), GB (European patent), GR 
(European patent), IT (European patent), JP, LU (Euro- 
pean patent), NL (European patent), SE (European pa- 
tent). 


Published 

With international search report. 


(54) Title: MAREK'S DISEASE HERPESVIRUS DNA SEGMENT ENCODING GLYCOPROTEINS, gD, gl AND gE 
(57) Abstract 

DNA encoding glycoproteins gD, gl and gE from Marek's disease herpesvirus is described. The DNA is useful for probes 
to detect the DNA in the herpesvirus, for expression to produce the glycoproteins that can be used for producing the antibodies 
which specifically recognize the three glycoprotein antigens, and in the case of the latter two genes, for potential insertion sites for 
foreign genes and as possible sites for gene inactivation to attenuate MDV field isolates for vaccine purposes. 


FOR THE PURPOSES OF INFORMATION ONLY 


Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 


AT 

Austria 

ES 

Spain 

MC 

Madagascar 

AU 

Australia 

Fl 

Finland 

ML 

Mali 

BB 

Barbados 

FR 

France 

MN 

Mongolia 

BE 

Belgium 

CA 

Gabon 

MR 

Mauritania 

BF 

Burkina Fa*o 

CB 

United Kingdom 

MW 

Malawi 

BG 

Bulgaria 

CN 

Guinea 

NL 

Netherlands 

BJ 

Benin 

CR 

Greece 

NO 

Norway 

BR 

Brazil 

HU 

Hungary 

PL 

Poland 

CA 

Canada 

IT 

Italy 

RO 

Romania 

CF 

Central African Republic 

JP 

Japan 

SO 

Sudan 

CG 

Congo 

KP 

Democratic People's Republic 

SE 

Sweden 

CH 

Switzerland 


or Korea 

SN 

Senegal 

CI 

Cote d'l voire 

KR 

Republic or Korea 

SU + 

Soviet Union 

CM 

Cameroon 

LI 

Liechtenstein 

TD 

Chad 

CS 

Czechoslovakia 

LK 

Sri Lanka 

TC 

Togo 

DE* 

Germany 

LU 

Luxembourg 

US 

United Slates or America 

DK 

Denmark 

MC 

Monaco 




+ Any designation of "SU" has effect in the Russian Federation. It is not yet known whether 
any such designation has effect in other States of the former Soviet Union, 


WO 92/03547 


PCT/US91/05870 


MAREK'S DISEASE HERPESVIRUS DNA SEGMENT 
ENCODING GLYCOPROTEINS, gD, gl and gE 

Cross-Reference to Related Application 

This application is a continuation-in-part of 
U.S. application Serial No. 07/572,711, filed August 24, 
1990. 

BACKGROUND OF THE INVE ^'ION 

(1) Field of the Inveu-ion 

The present invention relates to segments of the 
Marek's Disease Herpesvirus genome, from its unique short 
(Us) region encoding glycoproteins gD, gl and gE, and to 

10 novfel glycoproteins produced therefrom. In particular, the 
present i- Dntion relates ;.o DNA segments containing genes 
encoding tnese glycoprotein antigens and containing 
potential promoter sequences up to 400 nucleotides 5" of 
each gene, segments which are useful for probing for 

15 Marek's disease herpesvirus, as a possible source for 

Marek's disease virus (MDV) promoters, for gene expression 
to produce the glycoproteins that in turn can be used for 
producing antibodies which recognize the three glycoprotein 
antigens, and in the case of the latter two genes, for 

20 potential insertion sites for foreign genes and as possible 
sites for gene inactivation to attenuate MDV field isolates 
for vaccine purposes. 

(2) Prior Art 

MDV is an oncogenic herpesvirus of chickens, 
25 which is known to cause T cell lymphomas and peripheral 
nerve demyelination. The resulting disease, Marek's 
disease (MD), was the first naturally occurring 
lymphomatous disorder to be effectively controlled via 
vaccination, using either the antigenically related, yet 
30 apathogenic, herpesvirus of turkeys (HVT) or atter; ated 
field isolates of MDV. 

Because of similar biological properties, 
especially its lymphotropism, MDV has been classified as a 
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member of the gammaherpesvirus subfamily (Roizman, B., et 
al., Intervirology 16:201-217 (1981)). of the three 
herpesvirus subfamilies , gammaherpesviruses exhibit 
particularly marked differences with regard to genome 

5 composition and organization. For example, the 

B-lymphotropic Epstein-Barr virus (EBV) of humans has a" 
172.3 kbp genome with 60% G+C content, is bounded by 
terminal 0.5 kbp direct repeats and contains a 
characteristic set of internal 3.07 kbp tandem repeats 

10 (Baer, R., et al., Nature (London) 310:207-211 (1984)). 

Herpesvirus saimiri (HVS), a T-lympho tropic herpesvirus of 
new-world monkeys and lower vertebrates, has an A+T rich 
coding sequence (112 kbp; 36% G+C; i.e. L-DNA) without any 
large-scale internal redundancy, but contains instead 

15 greater than 30 reiterations of a 1.44 kbp sequence of 71% 
G+C at the termini of the genome (H-DNA) (Banker, A. T. f et 
al., J. Virol. 55:133-139 (1985)). Despite the structural 
differences between EBV and HVS, the genomes of these two 
viruses encode serologically related proteins and share a 

20 common organization of coding sequences which differs from 
that of the neurotropic alphaherpes viruses, exemplified by 
herpes simplex virus (HSV) and varicella-zoster virus (VZV) 
(Camerion, K. R. , et al., J. Virol. 61:2063-2070 (1987); 
Davison, A. J., et al., J. Gen. Virol. 68:1067-1079 (1987); 

25 Davison, A. J., et al., J. Gen. Virol. 67:597-611 (1986); 

Davison, A. J., et al., J. Gen. Virol. 76:1759-1816 (1986); 
Davison, A. J., et al., J. Gen. Virol. 64:1927-1942 (1983); 
Gompels, U. A., J. of Virol. 62:757-767 (1988); and 
Nichols, J., et al., J. of Virol. 62:3250-3257 (1988)). 

30 * n contrast to other gammaherpesviruses, MDV has 

a genome structure closely resembling that of the 
alphaherpesviruses (Cebrian, J., et al., Proc . Natl. Acad. 
Sci. USA 79:555-558 (1982); and Fukuchi, K. , et al., J. 
Virol. 51:102-109 (1984)). Members of the latter subfamily 

35 have similar genome structures consisting of covalently 
joined long (L)'and short (S) segments. Each segment 
comprises a unique (U) segment (U L , U s ) flanked by a pair 
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(terminal and internals) of inverted repeat regions (TRj,, 
IR^; TRg; respectively). Alphaherpes viruses include human 
HSV and VZV, porcine pseudorabies virus (PRV) , bovine 
herpesvirus (BHV) and equine herpesvirus (EHV) . Because 
5 MDV contains extensive repeat sequences flanking its Uj, 
region, its genome structure most resembles that of HSV- 
(Cebrian, J. , et al., Proc. Natl* Acad. Sci. USA 79:555-558 
(1982); and Fukuchi , K., et al., J. Virol. 51:102-109 
(1984)). 

10 Recent studies (Buckmaster, A. E. , et al., J. 

Gen. Virol. 69:2033-2042 (1988)) have shown that the two 
gammaherpesvi ruses, MDV and HVT, appear to bear greater 
similarity to the alphaherpesvi ruses , VZV and HSV, than to 
the gammaherpesvi rus , EBV. This was based on a comparison 

15 of numerous randomly isolated MDV and HVT clones at the 
predicted amino acid level; not only did individual 
sequences exhibit greater relatedness to alphaherpesvi rus 
genes than to gammaherpesvi rus genes, but the two viral 
genomes were found to be generally collinear with VZV, at 

20 least with respect to the unique long (Ul) region. Such 
collinearity of Ul genes extends to other 

alphaherpesvi ruses such as HSV-1, HSV-2, EHV-1 and PRV as 
evidenced by both sequence analysis (McGeoch, D. J., et 
al., J. Gen. Virol. 69:1531-1574 (1988)) and DNA 

25 hybridization experiments (Davison, A. J., et al., J. Gen. 
Virol. 64:1927-1942 (1983)). Many of these U L genes are 
shared by other herpesviruses, including the beta- and 
gammaherpesvi ruses (Davison, A. J., et al., J. Gen. Virol. 
68:1067-1079 (1987)). The organization and comparison of 

30 such genes has suggested the past occurrence of large-scale 
rearrangements to account for the divergence of 
herpesviruses from a common ancestor. Unfortunately, such 
a hypothesis fails to account for the presence of 
alphaherpesvirus S component (unique short, Ug, and 

35 associated inverted/terminal repeat short, lR$ r TRg) genes 
which appear unique to members of this subfamily (Davison, 
A. J., et al., J. Gen. Virol. 68:1067-1079 (1987); Davison, 
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A. J. , et al., J. Gen. Virol. 67:597-611 (1986); and 
McGeoch, D. J., et al., J. Mol. Biol. 181:1-13 (1985)). 

The DNA sequence and organization of genes in a 
5.5 kbp EcoRl fragment mapping in the Us region of MDV 

5 - strain RBIB was described by Ross , Binns and Pastorek 
(Ross, It. J. N. , et al. Journal of General Virology. 
72:949-954 (1991)). The properties and evolutionary 
relationships of four of the predicted polypeptides was 
also described (Ross, L. J. N. and M. M. Binns, Journal of 

10 General Virology, 72:939-947 (1991)). in that fragment 

they found the homologs of HSV US2 , DS3 , DS6 (gD) and 0S7 
(gl), as well as an MDV specific gene. For the latter, 
only part of the gene was present. These reports confirm 
the presence of four MDV U s genes, and the evolutionary 

15 relationship proposed above. It is important to note that 
no evidence for US8 (gE), or the genes to the left of US2 
were described. 

In addition to its uniqueness compared wittr 
beta- and gammaherpesvi ruses , the alphaherpes virus U s 

20 region is particularly interesting because of marked 

differences in its content and genetic organization within 
the latter subfamily (e.g. HSV-1 U s =13.0 kbp, 12 genes, 
McGeoch, D, J., et al., J. Mol. Biol. 181:1-13 (1985))? VZV 
U s =5.2 kbp, 4 genes, Davison, A. J., et al., J. Gen. Virol. 

25 76:1759-1816 (1986)). In the case of HSV-1, 11 of the 12 
U s genes have been found to be dispensable for replication 
in cell culture (Longnecker, R. , et al., Proc . Natl. Acad. 
Sci. USA 84:4303-4307 (1987)). This has suggested the 
potential involvement of these genes in pathogenesis and/or 

30 latency (Longnecker, R. , et al., Proc. Natl. Acad. Sci. USA 
84:4303-4307 (1987); Meignier, B., et al., Virology 
162:251-254 (1988); and Weber, P. c. , et al., Science 
236-576-579 (1987)). In the report by Buckmaster et al . 
(Buckmaster, A. E. , et al., J. Gen. Virol. 69:2033-2042 

35 (1988)), except for the identification of partial MDV 

sequences homologous to HSV immediate early protein alpha 
22 (USD and the serine-threonine protein kinase (US 3) , the 
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content, localization and organization of MDV S component 
homologs was not determined. Moreover, despite the 
presence of at least four HSV Ug glycoprotein genes (two in 
VZV), no such homologs were identified, 
5 In application Serial No. 07/229,011 filed. 

August 5, 1988, including Leland F. Velicer, one of the 
present inventors, the Marek's Disease herpesvirus DNA 
segment possibly containing the gene encoding the 
glycoprotein B antigen complex (gplOO, gp60, gp49) was 

10 identified but not sequenced. Antigen B is an important 

glycoprotein complex because it can elicit at least partial 
protective immunity, and thus MDV DNA segment can be used 
for probes, as a possible source for promoters in the 
gene's 5 1 regulatory region, and *or gene expression to 

15 produce the glycoproteins, which n turn can be used to 

produce antibodies that recognizee the glycoprotein antigens 
However, there was no discussion of the glycoproteins of 
the present invention. These* B antigen glycoproteins are 
not encoded by the Ug region and thus are from a different 

20 region of the MDV genome. 

In application. Serial No. 07/526,790, filed May 
17, 1987 by Leland F. Velicer, the MDV herpesvirus DNA 
segment containing the gene encoding the glycoprotein A 
antigen (gp5* -65) is described but not sequenced. This MDV 

25 DNA segment is useful as probes, as a possible source for 
promoters in the gene's 5 1 regulatory region, and for 
producing antibodies by the sequence of events described 
above. This DNA is also important because antigen A is now 
known to be a homolog of HSV gC, a gene non-essential for 

30 replication in cell culture. Since that property most 

likely also plies to the MDV homolog, it may be useful as 
a site for insertion of foreign genes, jowever, there was 
no discussion of the glycoproteins of the present invention 
This glycoprotein is also not encoded by the Ug region and 

35 is thus from a different region of the MDV genome. 

Other glycoproteins are encoded by Marek's 
disease herpesvirus genome. In application Serial No. 
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10 


07/572,711, filed August 24, 1990 by Leland F. Velicer, et 
al., the MDV DNA containing the genes encoding the MDV, gD, 
gl and part of gE glycoproteins is described, with MDV 
nucleotide sequences for the complete gD and gl genes and 
part of gE (MDV homologs of HSV genes US6, DS7, US8, 
respectively). This MDV DNA segment is useful as probes, 
as a possible source for promoters in the gene's 5' 
regulatory region, and for producing antibodies by the 
sequence of events described above. The present invention 
is particularly directed to the complete gene (US8) 
encoding glycoprotein gE. 
OBJECTS 

It is an object of the present invention to 
provide sequenced DNA encoding glycoproteins gD, gl and gE, 

15 both together and individually. It is further an object of 
the present invention to provide DNA segments encoding 
these glycoprotein antigens and containing potential 
promoter sequences up to 400 nucleotides 5' of each gene; 
which are useful as DNA probes, as a possible source for 

20 MDV promoters, for producing antibodies which recognize the 
antigens and, in the case of the latter two glycoproteins, 
as insertion sites for foreign genes and as possible sites 
for gene inactivation to attenuate MDV field isolates for 
vaccine purposes. These and other objects will become 

25 increasingly apparent by reference to the following 
description and the drawings. 
IN THE DRAWINGS 

Figures lA to C show map location, sequencing 
strategy and organization of MDV open reading frames 

30 (ORFs): 

Figure lA includes MDV genomic structure and 
restriction maps outlining area sequenced. 

Figure IB includes map location and sequencing 
strategy. Boxes define plasmid clones with BamHI, EcoRl or 
35 Sail-bound inserts that were used to generate Ml3mpl8 and 
-19 templates for DNA sequencing. Rightward and leftward 
arrows define sequences derived from the top and bottom 
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strands , respectively. The restriction enzyme sites are 

identified as: B = BamHI, E = EcoRI , Nc = Ncol , Ns = Nsil, 

S = Sail, and P = Pstl. Sequences derived from random 

libraries (Sau3A, Taql, Rsal), specific cloned restriction 
5 fragments, Bal 31-digested libraries or using 

synthetically-derived oligonucleotides are denoted by a, b-,. 

c, and d, respectively. 

Figure 1C includes organization of the MDV Us 

ORFs. Numbers refer to homologs based on relation to HSV-1 
10 Us ORP nomenclature (McGeoch, D. J. , et al. r J. Mol. Biol* 

181:1-13 (1985)). Boxes represent location of MDV ORPs. 

Arrows define direction of transcription/translation. 

Names of ORFs are displayed above boxes. Potential 

polyadenylation signals on the top and bottom strands are 
15 highlighted by AATAAA and AA AT AA , respectively. SORFl and 

S0RF2 are MDV-specific S component ORFs given arbitrary 

names,. 

Figure 2 shows nucleotide and predicted amino 
acid sequences. The nucleotide sequence is given as the 

20 rightward 5' to 3' strand only (numbered 1 to 10350). 

Rightward- and leftward- directed predicted amino acid 
sequences are shown above and below the corresponding 
nucleotide sequences in single-letter code, respectively. 
The name of each ORF is given to the left of the first line 

25 of the amino acid sequence. Amino acid sequences are 

numbered from the first M (three letter code) (ATG in the 
DNA) at the N-terminus to the last amino acid at the 
C-terminus, which precedes the termination codon 
(identified by an *) . Potential TATA consensus sites 

30 located within 400 nucleotides of the ATG are underlined 
and defined as sites containing at least six of seven 
matches to the TATA ( AT ) A( AT ) consensus sequences defined by 
Corden et al . (Corden, B., et al. , Science 209:1406-1414 
(1980)). Underlines longer than seven nucleotides refer to 

35 areas containing overlapping TATA consensus sites. 

Figure 3A shows alignment of S component 
homologs showing selected regions displaying maximum amino 
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acid conservation. Gaps have been introduced to maximize 
alignment of identical amino acids as described in Methods. 
The consensus sequence (cons) indicates residues that are 
shared by at least all but one of the viruses and are 
indicated by capital letters. In alignments between more 
than two sequences, asterisks <*) indicate residues 
conserved by all of the viruses. Amino acid numbers (with 
respect to 5'-ATG) of corresponding regions aligned are 
listed before and after each sequence. 

Figure 3B shows the dot matrix analyses 
depicting overall homologies between selected 
MDV-alphaherpes virus S segment homolog comparisons. Points 
were generated where at least 15 amino acids over a sliding 
window length of 30 were found identical or similar. The 
15 resulting diagonals illustrate regions showing greatest 

conservation. Amino acid numbers (with respect to 5 1 -ATG ) 
of corresponding sequences are denoted above and to the 
right of each plot. 

Figure 4 shows a comparison of overall genome 
organization of available S component ORFs (Audonnet, 
J.-C. f et al., J. Gen. Virol. 71:2969-2978 (1990) ; McGeoch, 
D. J., et al., J. Gen. Virol. 68:19-38 (1987); Tikoo, S. 
K., et al., J. Virol. 64:5132-5142 (1990); Van Zijl, m. , et 
al., J. Gen. Virol. 71:1747-1755 (1990); Zhang, G., et al., 
25 j. Gen. Virol. 71:2433-2441 (1990); Cullinane, A. A., et 

al., J. Gen. Virol. 69:1575-1590 (1988); Davison, A. J., et 
al., J. Gen. Virol. 76:1759-1816 (1986); McGeoch, D. J., et 
al., J. Mol. Biol. 181:1-13 (1985); Petrovskis, E. A., et 
al., Virology 159:193-195 (1987); Petrovskis, E. A., et 
30 al., J. Virol. 60:185-193 (1986); and Petrovskis, E. A., et 
al., J. Virol. 59:216-223 (1986)). Numbers above each ORF 
refer to homologs based on relation to HSV-1 Us ORF 
nomenclature (McGeoch, D. J., et al., J. Mol. Biol. 
181:1-13 (1985)). Alternative polypeptide designations 
35 common to each system are listed below those ORFs where 
applicable. Upper and lower case solid bars refer to 


20 
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rightward and leftward-directed ORFs, respectively. Arrows 
refer to identified IR S -U S and/or U S -TR S junction sites. 

Figure 5 shows the sequence of steps necessary 
to produce a complete segment of Marek's disease 
5 herpesvirus DNA encoding glycoprotein gl and the part of 
gE included in the application filed August 24, 19 9G. 
GENERAL DE S CRIPTION 

The present invention relates to a segment of 
DNA of Marek's disease herpesvirus genome encoding multiple 
10 glycoproteins , and containing potential promoter sequences 
up to 400 nucleotides 5' of each gene, between a 1 and 
10350 nucleotide sequence as shown in Figure 2 (and 
identified as SEQ ID No:l). 

Further, the present invention relates to an 
15 EcoRl I segment of Marek's disease herpesvirus genome 

encoding the glycoprotein D precursor, and subsegments of 
the DNA. 

Further, still, the present invention relates to 
a segment of DNA encoding glycoprotein gD precursor between 

20 a 5964 and 7172 nucleotide sequence of Marek's disease 
herpesvirus DNA, and containing potential promoter 
sequences up to 400 nucleotides 5* of each gene, as shown 
in Figure 2 (and identified as part of SEQ ID No.:l) and 
subsegments of the segment of DNA which recognize the DNA. 

25 The present invention also relates to a segment 

of DNA encoding glycoprotein gl precursor between a 7 282 
and 8346 nucleotide sequence of Marek's disease herpesvirus 
DNA, and containing potential promoter sequences up to 400 
nucleotides 5' of each gene, as shown in Figure 2 (and 

30 identified as part of SEQ ID No:l) and subsegments of the 
segments that recognize the DNA. 

The present invention also relates to a segment 
of DNA encoding glycoprotein gE precursor between a 8488 
and 9978 nucleotide sequence of Marek's disease herpesvirus 

35 DNA, and containing potential promoter sequences up to 400 
nucleotides 5' r of each gene, as shown in Figure 2 (and 
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(and identified as part of SEQ ID No:l) and subfragments of 

the DNA that recognize the DNA. 

Further, the present invention relates to the 

novel glycoprotein precursors which are produced by 
5 expressions of the genes in the segments of DNA. 

Further the present invention relates to the" 

potential MDV gene promoters, which are located in the 400 

nucleotides 5' of each coding sequence. 

SPECIFIC DESCRIPTION 
10 Tne present invention shows a sequence analysis 

of a 10.35 kbp DNA stretch encompassing a majority of the 

MDV U s region. Altogether seven MDV U s homologs, including 

three glycoprotein genes and two additional MDV-specific 

open reading frames, were identified. 
15 Example 1 

Materials and Methods 

Recombi nant Plasmids, M-13 subcloninq and DNA 

sequencing 

MDV EcoRl-0 and EcoRl-I of the pathogenic GA 
strain were previously cloned into pBR328 (Gibbs, C. P., et 
al., Proc. Natl. Acad. Sci. USA 81:3365-3369 (1984)), 
(Silva, R. F. , et al., J. Virol. 54:690-696 (1985)) and 
made available by R. F. Silva, OSDA Avian Disease and 
Oncology Lab, East Lansing, MI, where these clones are 
25 maintained. GA strain BamHI-A and BamHI-Pl were previously 
cloned into pACYCl84 and pBR322, respectively (Fukuchi , K., 
et al., J. Virol. 51:102-109 (1984)) and kindly provided by 
M. Nonoyama, Showa University Research Institute, St. 
Petersburg, FL. GA strain clone GA-02, an EMBL-3 clone 
containing a partially digested MDV Sail insert, which 
contains BamHI-A, -Pi, and additional 5' and 3' flanking 
sequences (kindly provided by P. Sondermeier, Intervet 
Intl. B. V., Boxmeer, The Netherlands) was used to extend 
analysis to the right of the above EcoRl and BamHl 
35 fragments. This phage clone was used to generate pUCl8 
subclones with smaller Sal I-bound inserts (psPl8-A, 
pSPl8— B , and pSPl8-C) containing the 3' BamHI-Pl-f lanking 


20 


30 


> 


WO 92/03547 PCT/US91/05870 

-11- 

region. These clones (Figure IB) were used to generate 
Ml3mpl8 and -19 subclones for use as templates for 
nucleotide sequencing. Small- and large-scale plasmid 
preparations were made using the alkaline lysis procedure 

5 (Maniatis, T. f et al., Molecular cloning: a laboratory 

manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, 
New York (1982) ) . 

Ml3mpl8 and Ml3mpl9 phage subclones to be used 
as templates for sequencing were generated using specific 

10 restriction subfragments determined by restriction mapping 
or the use of Sau3A, Taq I or Rsal-digested viral DNA pools 
ligated into the unique BamHI r AccI or Smal sites of M13 RF 
DNA f respectively. In some cases overlapping Ml3 deletion 
clones were obtained by processive Bal31 digestions from 

15 AccI, Nael or Nsil restriction sites in EcoRl-0 by the 

method of Poncz et al (Poncz, M. , et al., Proc. Natl. Acad. 
Sci. USA 79:4298-4302 (1982)). Standard methods 
(Maniatis, T. , et al., Molecular cloning: a laboratory 
manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, 

20 New York (1982)) were used for restriction digestions, gel 
electrophoresis, purification of DNA fragments from agarose 
gels, ligations and fill-in of 5' overhangs with Klenow 
fragment. 

Ligated M13 products were transformed into 
25 CaCl2"Competent JM107 host cells and added to melted B top 
agar containing 10 1 of 100 mM IPTG, 50 1 of 2% X-gal and 
200 1 of a fresh overnight JM101 culture. These contents 
were then plated onto B agar plates and incubated at 37°C 
overnight. Recombinant (clear) plaques were then used 
30 to infect 5 ml of YT media diluted 1:50 with an overnight 
JM101 culture and rotated at 37 °C for 6 hours. The 
resulting cells were pelleted by centrifugation for 5 
minutes at room temperature and the supernatants were 
removed and stored at 4°C to retain viral stocks of each 
35 recombinant clone. 

Using the recovered supernatants, 
single-stranded M13 phage DNA to be used as templates for 


WO 92/03547 


PCT/US91/0S870 


-12- 

DNA sequencing by the dideoxy-chain termination method was 
isolated according to instructions in the M13 
Cloning/Dideoxy Sequencing Instruction Manual provided by 
Bethesda Research Laboratories. Recombinant Ml3mp phages 
5 were further screened by electrophoresing purified 

single-stranded viral DNA on 1% agarose mini-gels and 
selecting those templates showing reduced mobility in 
comparison to single-stranded Ml3mp 18 control DNA. 

DNA sequencing with single-stranded M13 

10 templates was performed by the dideoxy-chain termination 
method (Sanger, F. S., et al., Proc. Natl. Acad. Sci. USA 
74:5463-5467 (1977)) employing the modified T7 DNA 
polymerase, Sequenase™ (United States Biochemical Corp., 
Cleveland, Ohio). A summary of the sequencing strategy is 

15 included in Figure IB. For DNA sequencing reactions, the 
. specific step by step instructions provided with the 
... : . • , Sequenase™ sequencing kit were employed. Briefly, , . 

single-stranded M13 templates were first annealed with the 
universal M13 synthetic oligonucleotide primer by 

20 incubation at 65°C for 2 minutes followed by slow cooling, , 
until the incubation temperature was below 30°C. Following 
the addition of proper mixtures of deoxy- and 
dideoxynucleotide triphosphates (dNTPs and ddNTPs, 
respectively), radioactively labeled deoxyadenosine 5'- 

25 (alpha-thio) triphosphate ( 35 S-dATP, 1000-1500 Ci/mmol; 

NEN-DuPont) and the Sequenase™ enzyme, synthesis of 
radioactively labeled complementary strands was initiated 
from the annealed primer. Four separate synthesis 
reactions were each terminated by the incorporation of the 

30 specific ddNTP (ddATP, ddGTP, ddTTP or ddCTP) used in each 

tube. Reaction products were electrophoresed through 7% 
polyacrylamide/50% urea/Tris-Borate-EDTA gels and the 
labeled chains were visualized by autoradiography. Both 
strands were sequenced at least once. This was facilitated 

35 by the use of 16 synthetic 17-mer olignonucleotides 

generated based* on previously determined sequences and 
substituted for the universal primer under similar reaction 
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conditions above (0.5 pmoles reaction) according to the 
general approach described by Strauss (Strauss, E. C, et 
al., Anal. Biochem. 154:353-360 (1986)). 
Analysis of sequence data 
5 Sequences were assembled and analyzed on an IBM 

personal System 2/Model 50 microcomputer utilizing the 
IBI/Pustell (Pustell, J., et al., Nucl. Acids. Res. 
14:479-488 (1986)) and Genepro (Version 4.10; Riverside 
Scientific Enterprises, Seattle, WA) sequence analysis 

10 software packages or programs obtained from the University 
of Wisconsin Genetics Computer Group (GCG; Devereaux, J., 
et al. f Nucl, Acids. Res. 12:387-395 (1984)) and run on a 
VAX 8650 minicomputer. Database searches of the National 
Biochemical Research Foundation-Protein (NBRF-Protein, 

15 Release 21.0, 6/89) were made with the GCG program FASTA 
(Pearson, W. R. , et al., Proc. Natl. Acad. Sci. USA 
85:2444-2448 (1988)) which uses: (1) a modification of the 
algorithm of Wilbur and Lipman (Wilbur, W. J. , et al., 
Proc. Natl. Acad. Sci. USA 80:726-730 (1983)) to locate 

20 regions of similarity; (2) a PAM250-based scoring system 
(Dayhoff, M. O., et al . , p. 345-352. In M. 0. Dayhoff 
(ed.), Atlas of protein sequence and structure, vol. 5, 
Suppl. 3. National Biomedical Research Foundation, 
Washington, D. C. (1978)) and (3) the alignment procedure 

25 of Smith and Waterman (Smith, T. F., et al., Adv. Appl. 
Mathematics 2:482-489 (1981)) to join together, when 
possible, the highest-scoring, non -overlapping regions in 
order to derive an alignment and its resulting, optimized 
score. Dot matrix homology plots were generated by using 

30 the GCG program DOTPLOT with the output file from GCG's 
COMPARE. The latter creates a file of the points of 
similarity between two predicted amino acid sequences for 
which a window length of 30 and a stringency of 15 (in 
which conservative amino acid replacements are scored 

35 positive) were chosen. Using the GCG program GAP, specific 
amino acid sequences were aligned using the algorithm of 
Needleman and Wunsch (Needleman, s. B., et al., J. Mol. 
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Biol. 48:443-453 (1970)); following the insertion of gaps 
(to maximize the number of matches) the percentage of 
identical and similar amino acid residues were determined. 
TO create multiple alignments using GAP, output files of 
5 gapped MDV sequences were created following successive 

GAP comparisons between the MDV sequence and its homologous, 
sequences (in descending order of homology). These output 
files were used as input sequences for subsequent runs of 
GAP until the alignment of these gapped sequences could no 

•10 longer be expanded by the addition of new gaps. Following 
alignment, the gapped output files were displayed and a 
consensus sequence calculated using the GCG program PRETTY. 
To achieve optimal results, in some cases manual editing 
was employed (using GCG's LINEUP). 

15 Results 

The 10,350 nucleotide DNA sequence presented 
(Figure 2) appears to encompass a majority of the MDV (GA) 
genome's unique short (U s ). region. A summary of the 
sequencing strategy is included in Materials and Methods 

20 and is depicted in Figure„lB. This sequence spans the D s 

fragments, EcoRl-0, EcoRl-I and extends to a Sail site 1.55 
kbp downstream of the 3' end of BamHI-Pi (Figures 1a and 
IB). Fukuchi et al. (Fukuchi , K. , et al., J. virol. 
51:102-109 (1984)) have previously mapped the IR S -U S 

25 junction to a 1.4 kb Bgl I fragment located in the second 
of five EcoRl subfragments of BamHI-A (Figure IB) . Thus, 
the sequence presented here should lack between 2.6 and 4.0 
kb of the 5 '-proximal U s region, assuming the above IR S -U S 
junction location can be independently confirmed. Because 

30 the region sequenced does not extend a sufficient distance 
downstream of BamHI-P lr the MDV U S -TR S junction has not yet 
been precisely defined (Davison, a. J., et al., j. Gen. 
Virol. 76:1759-1816 (1986)). For VZV, EHV-4 and HSV-1, 
this border is located about 100 bp upstream, or 1.1 and 

35 2.7 kb downstream, respectively, of the termination codon 

of their respective US 8 homologs (Cullinane, A. A., et al. , 
J. Gen. Virol. 69:1575-1590 (1988); Davison, A. J., et al., 
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J. Gen. Virol. 76:1759-1816 (1986); and McGeoch, D. J., et 
al., J. Gen. Virol. 69:1531-1574 (1988)). 

The overall G+C content of the region sequenced 
was found to be 41%, somewhat below the genomic MDV G+C 
5 values of 46% (Lee, L. F. , et al., J. Virol. 7:289 (1971)) 
Observed frequencies of CpG dinucleotides in the whole - 
sequence, or in the coding regions only, did not differ 
significantly from those expected from their mononucleotide 
compositions (data not shown). This result agrees with 

10 those obtained from alphaherpesvi ruses, while contrasting 
with those obtained from gammahe rpesvi ruses , such as the 
A+T rich HVS and the G+C rich EBV, which are both deficient 
in CpG dinucleotides (Honess, R. W., et al., J. Gen. Virol. 
70:837-855 (1989)). 

15 The region sequenced contains 9 complete ORFs 

likely to vcode for proteins (Fig. 1C, basis for names is 
given below)'. This prediction was based on: (1) homology 
and positional organization comparisons to other 
alphaherpesvirus genes and (2) presence of potential TATA 

20 and polyadehylation consensus sequences (Birnstiel, M. L., 
et al./ Cell 41:349-359 (1985); and Corden, B., et al., 
Science 209:1406-1414 (1980)), and (3) possession of 
favorable contexts for translational initiation (Kozak, M. , 
J. Cell Biool. 108:229-241 (1989)). This identification 

25 was further guided by the observation that 

alphaherpesviruses such as HSV and VZV tend to contain 
relatively tightly packed, unspliced and generally 
nonoverlapping coding regions (Davison, A. J., et al. f J. 
Gen. Virol. 76:1759-1816 (1986); Davison, A. J., et al., J. 

30 Gen. Virol. 76:1759-1816 (1986); McGeoch , D. J., et al., J. 
Gen. Virol. 69:1531-1574 (1988); McGeoch, D. J., et al., J. 
Mol. Biol. 181:1-13 (1985); and McGeoch, D. J., et al., J. 
Gen. Virol. 68:19-38 (1987)). Such genes, especially those 
of the Ug regions, often share polyadenylation signals, 

35 thereby resulting in 3 1 -coterminal mRNA families (Rixon, F. 
J., et al., Nuci. Acids Res. 13:953-973 (1985)). Methods 
for detecting protein coding regions based on the use of 
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MDV-derived codon frequency tables (using these and 
previously published MDV sequences, Binns, M. M. , et al., 
Virus Res. 12:371-382 (1989); Ross, L. J. N. , et al., J.' 
Gen. Virol. 70:1789-1804 (1989); and Scott, S. D., et al., 
J. Gen. Virol. 70:3055-3065 (1989)) or analysis of 
compositional bias (using the GCG programs CODONPREFERENCE- 
and TESTCODE) were largely inconclusive, suggesting that 
MDV possesses relatively low codon and compositional biases 
compared to those prediced based on its mononucleotide 
composition. However, using the GCG program FRAMES, 
together with the MDV -derived codon frequency table above, 
the 9 identified ORFs clearly show a significantly low 
pattern of rare codon usage, which sharply contrasts with 
that observed in all other potentially translatable regions 
(data not shown). 

15 The predicted amino acid sequences of the 

predicted ORFs (beginning from the first ATG codon) are 
shown relative to the nucleotide sequence in Figure 2. 
Potential TATA sites within 400 nucleotides of the 
initiation codon are underlined. Proposed ORF and 

20 potential polyadenylation signal locations, identification 
of the -3, +4 ATG context nucleotides (Kozak, M., J. Cell 
Biol. 108:229-241 (1989)), as well as the lengths, relative 
molecular masses and predicted isoelectric points of the 
predicted translational products are shown in Table 1. 

25 A summary of MDV data is shown in Table 1, with 

location of ORFs, predicted polyadenylation signals 
utilized, translational context nucleotides, lengths, 
relative molecular sizes and isoelectric points of 
predicted translation products . 

30 
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TABLE 1 








Pre- 





Predicted 

-3, +4 ATG a 

dicted 15 





Poly- 

Context 


Molecular Pre- 





Nucleo- 

Length 



Hame 

Start 

End 

tion Site 

tides 

(as) 

(kDa) 

pic 


z to 

7ft A 
i O ft 

1 777 
X / / / 

A, A 

179 

20.4 

- 6^5 

DS10 

1077 

1715 

1777 

6,6 

213 

23.6 

8.2 

SORFl 

2884 

1832 

1790 

A, A 

351 

40.6 

8.2 

US2 

3923 

3114 

1790 

A,G 

270 

29.7 

7.6 

US3 

4062 

5240 

5394 

A, 6 

393 

43.8 

6.1 

SORF2 

5353 

5793 

5904 

C,G 

147 

16.7 

9.8 

US6 

5964 

7172 

10040 

G,G 

403 

42. 6 d 

10.3 d 

US7 

7282 

8346 

10040 

6,T 

355 

38.3 d 

6.7 d 

US8 

8488 

9978 

10040 

A,T 

497 

53. 7 d 

8.0 d 


15 y ^Nucleotides listed relative to -3, +4 positions , 

^/ respectively; numbering begins with the A of the ATG (AUG)2 
---5 7 ^codon as position +1; nucleotides 5 f to that site are l ? 
> ^'" assigned negative numbers. " 
*™ ^In absence of post-translational modifications. v 
20 c Calculated using the GCG program, ISOELECTRIC. 

^Based on sequences that follow the predicted signal 
peptide cleavage site. 

In the absence of previous information 
concerning these MDV ORFs, and to simplify identification, 
25 they have been named (Figure 1C, Table 1) based on 

homologous relationships to HSV-1 encoded Us ORFs (McGeoch, 
D. J. , et al., J. Mol. Biol. 181-1-13 (1985)). When 
appropriate, the letters MDV will preface the homolog f s 
name to indicate the ORF's origin. The two MDV-specific 
30 O r ?s have >een arbitrarily named SORFl and S0RF2, based on 
tneir location in the S component. 

According to the scanning model for translation, 
the 40S ribosomal subunit binds initially at the 5' -end of 
mRNA and then migrates, stopping at the first AUG (ATG) 
35 codon in a favorable context for initiating translation 

(Kozak, M. , J. Cell Biol. 108:229-241 (1989)). However, in 
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- the absence of Si nuclease and/or primer extension 
analysis, definitive start sites for translation cannot be 
accurately predicted. Nevertheless, likely start sites are 
listed in Table 1; these refer to the location of the first 
inframe ATG codon found in the major open reading frame. 

5 According to Kozak (Kozak, M. , J. Cell Biol. 108:229t241 
(19 89) ) f as long as there is a purine in position -3, 
deviations from the rest of the consensus only marginally 
impair initiation. In the absence of such a purine f 
however, a guanine at position +4 is essential for 
10 efficient translation. Table 1 shows that all of the ORFs, 
except for SORF2, contain the important purine residue in 
the -3 position. Nevertheless, in the case of SORF2, a 
compensating guanine in position +4 is indeed present. 

In the case of MDV USl, two transcriptional cap 

15 sites have been tentatively identified by 5J; Si nuclease 
protection analysis (data not shown). These sites appear 
to be located 18 and 25 nucleotides downstream of a TATATAA 
sequence at position 200 and 207, respectively (Figure 2) 
Based on 3' Si data, this transcript utilizes a 

20 polyadenylation signal located just downstream of the US10 
coding region (Table 1, data not shown). Comparative 
Northern blot analyses of the Us region indicate that the 
MDV USl transcript appears to be the most prominent 
transcript expressed at late times (72h) post-infection 

25 when extensive cytopathic effects are observed (data not 
shown) . Phosphonoacetic acid inhibition studies have 
indicated that MDV USl, in contrast to its immediate-early 
HSV1 USl counterpart, is regulated as a late class gene 
(data not shown) . 

30 Using the computer program FASTA (Pearson, W. 

R. , et al., Proc. Natl. Acad. Sci . USA 85:2444-2448 (1988)) 
with a K-tuple value of 1, each of the 9 predicted amino 
acid sequences was screened against the NBRF-Protein 
database (Release 21.0, 6/89), and recently published EHV-4 

35 S segment gene Sequences (11). Optimized FASTA scores of 
greater than 100 were generally considered to indicate a 
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significant degree of amino acid similarity. The results 
of this analysis are in Table 2. 
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TABLE 2. PAIRUISE COMPARISONS OF KOV AHO AlPHAHERPESVIRUS S COHPONEHT HOHOtOCS 




US1 

US10 



Vims 

KOV HSV-1 VZV PRV EHV-4 

HOV HSV-1 VZV EHV-4 

X simitar 
Xidentical 

KDV 

HSV-1 

VZV 

PftV 

EHV-4 

47/26 43/27 51/33 48/30 
■»■/«> Vffcv 43/25 50/29 
43/27 49/29 - 51/35 54/36 
51/33 43/25 51/35 - 56/41 
48/30 50/29 54/36 56/41 

4S/24 40/24 45/29 
45/24 - 49/27 49/27 
40/24 49/27 - 55/32 

45/29 49/27 55/32 - 

FASTA 
scores 

KOV 

HSV-1 

VZV 

PRV 

EHV-4 

891 101 160 218- 208 
101 2,047 119 201 150 
160 119 1,378 340 359 
218 201 340 1.724 525 
208 150 3S9 525 1,308 

1,071 134 147 251 
134 1,617 123 180 
147 123 978 191 
a a a a 
251 180 191 1,312 


length (aa) 

179 420 278 364 *273 

213 312 180 2S9 


US2 

US3 



KOV HSV-1 PRV 

KOV HSV-1 VZV PRV 

X similar 
Xidentical 

51/33 48/26 
51/33 - 50/31 

a a • a 
48/26 50/31 . - • 

a a a 

56/38 * 54/33 55/33 

54£3 57/41 - 58/35 
55/33 59/36 58/35 
a a a a 

FASTA 
scores 

1,421 335 **118 
335 1,554 112 
a a a 

**168 112 1,240 
a a a 

1,931 611 616 563 
611 2,409 717 620 
616 717 1,960 595 
563 620 595 1,9*8 
a a a a 





270 291 256 

393 481 393 390 

Virus 

US6 

US7 



KOV HSV-1 PRV EKV-1 BHV-1 

KOV HSV-1 VZV PRV EHV-1 


X similar 
Xidencical 

KOV 

HSV-1 

VZV 

PRV 

EHV-1 

BHV-1 

42/21 44/23 43/21 42/33 
42/21 - 47/27 44/22 50/28 

b b b b b 
44/23 47/27 - 51/30 57/38 
43/21 44/22 51/30 - 52/30 
42/33 50/28 57/38 52/30 

39/22 46/23 43/25 41/23 
39/22 - 43/24 41/26 42/23 
46/23 43/24 - 47/25 46/29 
43/25 41/26 47/25 - 51/30 
41/23 42/23 46/29 51/30 
a a a a a 

FASTA 
scores 

KDV 

HSV-1 

VZV 

PRV 

EHV-1 

BHV-1 

2,068 211 279 246 291 
211 1,999 294 253 304 
b b b b b 
279 294 2,116 428 730 
246 2S3 428 1,995 494 
291 304 730 494 2,148 

1.816 145 228 184 242 
145 1,880 234 " 188 249 
228 234 1,705 198 298 
188 188 198 1,652 274 
242 249 298 274 1,979 
a a a a a 


length (aa) 

403 394 402 395 417 

355 390 354 350 424 


b existence of horaolog undetermined 

no homo log present in genome 
* actual length will differ somewhat, since probable 
•* different score when order of comparison reversed 


USfi 


KOV HSV-1 VZV PRV EHV-1 


X similar 
Xidentical 

44/22 43/22 46/28 47/22 
44/22 - 46/27 49/28 4 1 /23 
43/22 46/27 - 47/25 46/29 
46/28 49/28 49/29 - 54/34 
47/22 41/23 50/28 54/34 

a a a a a 

FASTA 
scores 

2,489 192 376 "243 399 
192 2,751 357 257 274 
376 357 3,171 329 468 

**217 257 329 2,923 417 
399 274 468 417 2,821 
8 a a a a 


497 


550 


623 


577 


552 
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While SORFl and SORF2 do not appear to share any 
significant homology to any of the sequences in the 
database (data not shown) , apart from MDV US3, the other 
six ORFs (MDV US1, 10, 2, 6, 7 , and 8; Tables 1, 2) were 
found to be homologous to alphaherpesvirus S segment genes. 
5 exclusively (Table 2). Because the US 3 ORF represents'a 
member of the serine-threonine protein kinase superfamily 
(Hanks, K. , et al., Science 241:42- (1988)), a 
relativel- large number of scores above 150 were obtained. 
Nevertheless, these scores were 3-4 fold lower than those 

10 obtained in comparisons with OS3 homologs of HSV, PRV and 
VZV. To compare with previously established 
alphaherpesvirus S segment homologies, all possible FAST A 
comparisons between the seven groups of 
alphaherpesvirus -related sequences are included. The 

15 program GAP was used in similar pairwise comparisons to 

generate optimal alignments in order to determine the total 
percentage of identical and similar amino acids shared by 
the two sequences. As shown in Table 2, homology 
comparisons between MOVES' segment ORFs and their 

20 alphaherpesvirus counterparts were comparable to those 

previously observed between the other alphaherpesvirus S 
segment homologs themselves. In some cases MDV ORFs were 
found to be more related to alphaherpesvirus homologs than 
those same homologs were to their other alphaherpesvirus 

25 counterparts (corapare MDV/EHV-4 vs. HSV-l/EHV-4 USl and 
MDV/EHV-4 vs. HSV-l/EHV-4 US10 homologies). Moreover, 
despite the fact that VZV lacks US 2 and US 6 homologs, MDV, 
although formally considered a gammaherpesvirus , clearly 
does possess US2 and US 6 homologs. The results of limited 

30 multiple alignments for each of the seven homologs in which 
areas showing best conserv ion are depicted in Figure 3A. 

Dot matrix homology plots depicting overall 
homologies between selected MDV -alphaherpesvirus S segment 
homolog comparisons are included in Figure 3B. (Using a 

35 sliding window 'length of 30 amino acids, in which points 
are generated where at least 15 amino acids are found 
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identical or similar). The resulting diagonals illustrate 
the regions showing greatest conservation. Such regions 
include and in some cases extend upon those regions 
depicted in Figure 3A. 

More sensitive attempts to identify other 
5 related, proteins not detected with FAST A were made using 
the GCG programs PROFILE and PROFILESEARCH. Use of these 
programs permit database comparisons which rely on 
information available from structural studies and, in this 
case, from information implicit in the alignments of 
10 related S component ORFs (including MDV sequences using 
GAP) (Gribskov, M., et al., Proc . Natl. Acad. Sci. USA 
84:4355-4358 (1987)); nevertheless, such analyses failed to 
extend upon the groups of related proteins described here. 

Herpesvirus glycoprotein homologs have generally 
15 been found, to contain similar patterns of conserved 

cysteine t . rt r f esid.ues. In comparing the gB homologs of seven 
different herpesviruses included in the alpha-, beta- and 
gammaherpesyirus subclasses, there is complete conservation 
of 10 cysteine residues (Ross, L. J. N . , et al . , J. Gen. 
20 Virol. 70:1789-1804 (1989)). HSV-1 US 6 (gD) contains 7 

cysteine residues: six appear critical for correct folding, 
antigenic structure and extent of oligosaccharide 
processing (Wilcox, w. C, et al., J. Virol. 62:1941-1947 
(1988)). Not only is this same general pattern of 
25 cysteines conserved in the gD homologs of HSV-2 (McGeoch, 
D. J., et al., J. Gen. Virol. 68:19-38 (1987)) and PRV 
(Petrovskis, E. A., et al., J. Virol. 59:216-223 (1986)), 
but they are conserved in the MDV gD homolog as well (full 
alignment not shown). Figure 3A depicts portions of 
30 cysteine conservation patterns observed among the US 6 (gD), 
US7 (gl), and US8 (gE) homologs (in which case 4, 3, and 6 
conserved cysteine residues are shown, respectively). 
While the MDV, VZV, PREV, and EHV-1 US8 homologs (Audonnet, 
J.-C, et al. r J. Gen. Virol. 71:2969-2978 (1990); Davison, 
35 A. J., et al., \J. Gen. Virol. 76:1759-1816 (1986); and 

Petrovskis, E. A., et al., J. Virol. 60:185-193 (1986)) all 
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share a similar pattern of four conserved cysteine residues 
near their amino termini, the HSV-1 and -2 counterparts 
carry only two of these (McGeoch, D. J. , J. Gen. Virol, 
71:2361-2367 (1990); data not shown). It is quite possible 
that the unique pattern of four conserved cysteines could 
5 facilitate the formation of different secondary and 

tertiary structures which might impart inportant functional 
consequences. These might be reflected by findings which 
show that HSV-1 gE has Fc receptor activity (Johnson, D. 
C, et al., J. Virol. 62:1347-1354 (1988) >, while its PRV 
10 and VZV counterparts do not (Edson, C. M. , et al.. 

Virology, 161:599-602 (1987); and Zuckerman, F. A., et al., 
J. Virol. 62:4622-4626 (1988)). 

Careful inspection of the N-terminal regions of 
the MDV gD, gl and gE homologs has revealed that they 
15 r contain the three basic building blocks of signal peptide,, 
^ sequences: a basic, positively charged N-terminal region 1 
! (n-region) , a central hydrophobic region (h-region), and a 
* v more polar terminal region (c-region) that seems to define 
v rf :t the cleavage site (von Heijne, G. J. Mol . Biol. 184:99-105 
20 (1985) )• Using a recently inproved method for predicting 
signal sequence cleavage sites (von Heijne, G. Nucl. Acids 
Res. 14: 4683-4690 (1986)), Table 3 shows the likely 
position of these sites, the location of the hydrophobic 
transmembrane and charged cytoplasmic domains near the 
25 C-terminal end and the location of potential 
N-glycosylation sites. 

Table 3 shows MDV Us glycoprotein data on 
predicted signal peptide cleavage sites and locations of 
transmembrane and cytoplasmic domains and potential 
30 N-glycosylation sites (with respect to the ATG initiation 
codon ) . 
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TABLE 3 

Predicted Trans- Cy to- 
Signal Peptide membrane plasmic N-glycosylation 
Name Cleavage Site Domain Domain sites 

DS6 G30-D31 358-374 375-403 87,138,230,306 

US7 Si 8 -Ii9 269-288 289-355 147,167,210,245., 

253 

US8 T 18 -Ai 9 394-419 420-497 60,133,148,203, 

229,277,366,388 


10 Like the other gl homologs, MDV's counterpart 

contains a relatively long cytoplasmic domain. However, in 
contrast to the other gD homologs, MDV gD's signal peptide 
contains a relatively long n-region {18 residues), that is 
unusually highly charged (+4; Figure 2) considering an 

15 overall mean value of +1.7 among eukaryotes, which 

generally does not vary with length (von Heijne, G r J. Mol. 
Biol. 184:99-105 (1985)). Although a more distal 
methionine codon exists directly before the initiation 
codon (as in the PRV gD homolog, Petrovskis, E. A . , et al., 

20 J. Virol. 59:216-223 (1986)) the scanning model for 

translation (Gribskov, M. , et al., Proc . Natl. Acad. Sci. 
USA 84:4355-4358 (1987)) favors usage of the more 
S'-proximal initiation codon (at position 5964, Figure 2). 
Further support is based on an overall translation context 

25 that appears at least as good as, if not better than, the 
one corresponding to the downstream ATG. Despite such a 
prediction, a possible mRNA cap site location between the 
two ATG sites, which would preclude such a prediction, 
cannot be ruled out at this point. 

30 0ne final point concerning MDV gD requires 

mention. Using the 10,350 nucleotide DNA sequence as a 
probe for screening the GenBank (62.0, 12/89) and EMBL 
(19.0, 5/89) nucleic acid databases with the computer 
program FASTA (K-tuple=6) , an optimized score of 1027, 

35 corresponding tp 91.5% nucleotide identity in a 3 42 bp 
overlap between MDV gD coding sequences (6479-6814; 
aa#173-aa#284; Figure 2) and a previously reported 467 bp 
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MDV DNA segment (Wen, L.-T., et al., J. Virol. 62:3764-3771 
(1988)). The latter sequence has been reported to contain 
a 60 bp segment protected against DNAse digestion by 
binding of a 28kD MDV nuclear antigen (MDNA) expressed only 
in "latently" infected MDV-trans formed lymphoblastoid cells. 
5 In view of similarities between MDV and VZV, these authors, 
suggested that MDNA may function in a manner analogous to 
that of EBNA-1 in immortalizing primate cells. In their 
report. Wen et al. (Wen, L.-T., et al. f J. Virol. 
62:3764-3771 (1988)) mapped the MDNA binding site to the 

10 same EcoRI subfragment of BamHI-A in which MDV gD is 
located (EcoRI-I, Figure 1). Although our sequence 
covering this region is consistent with a complete, 
uninterrupted ORF containing all the characteristic 
features of a glycoprotein, and showing significant 

15 homology to HSV gD, their sequence contains about 140 bases 
of 5' -proximal sequence unrelated to any determined from 
our 5.3 kbp EcoRl-I fragment or its adjoining 3.5 kb 
sequences. The remaining 327 bp sequence (which contains 
the putative nuclear antigen binding site) while clearly 

20 resembling our gD coding sequence, upon computer 

translation fails to yield any ORF longer than 30 aa. 
Discussion 

Recent data have shown that despite MDV's 
classification as a gammaherpes virus , based on lymphotropic 

25 properties shared with other members of this subfamily, its 
genome structure (Cebrian r J., et al., Proc . Natl. A^ad. 
Sci. USA 79:555-558 (1982); and Fukuchi r K. f et al., J. 
Virol. 51:102- 1 .09 (1984)) and genetic organization of 
primarily its U L region (Buckmaster, A. E. , et al., J. Gen. 

30 Virol. 69:2033-2042 (1988)) more closely resembles that of 
the neurotropic alphaherpesviruses . Moreover, in cases 
where polypeptide sequences were found conserved among the 
three herpesvirus subfamilies (e.g. U L genes), 
significantly higher homology scores were consistently 

35 observed against the respective alpha- rather than beta- or 
gammaherpesvirus counterparts (Davison, A. J., et a" J. 
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Gen. Virol. 67:597-611 (1986); Buckmaster, A. E. , et al., 
J. Gen. Virol. 69:2033-2042 (1988); Ross, L. J. N., et al., 
J. Gen. Virol. 70:1789-1804 (1989); and Scott, S. D. f et 
al., J. Gen. Virol. 70:3055-3065 (1989)). Alphaherpes virus 
S segment genes have previously been found to be unique to 

5 members of this taxonomic subfamily (Davison, A. J., et 
al., J. Gen. Virol. 68:1067-1079 (1987); and Davison, A. 
J., et al., J. Gen. Virol. 67:597-611 (1986)). The 
identification of seven MDV homologs of alphaherpesvirus S 
segment genes in this study is consistent with the idea 

,0 that MDV shares a closer evolutionary relationship with 
alphaherpesviruses than gammaherpesvi ruses . This is 
further supported by dinucleotide frequency analysis which 
fails to show a lack of CpG suppression as observed among 
all gammaherpesvi ruses thus far studied (Efstathiou, S., et 

.5 al., J. Gen. Virol. 71:1365-1372 (1990); and Honess, R. w. , 
et al., J. Gen. Virol. 70:837-855 (19 89)). The above 
situation resembles a similar one observed with human 
herpesvirus-6 (HHV-6), in which case its T-lymphotropism 
suggested provisional classification as a gammaherpesvi r us 

10 (Lopez, C, et al., J. Infect. Dis. 157:1271-1273 (1988)). 

However, subsequent genetic analysis has shown a greater 
relatedness between HHV-6 and the betaherpesvirus , human 
cytomegalovirus (HCMV; Lawrence, G. L. , et al., J. Virol. 
64:287-299 (1990)). 

25 A comparison of the genetic organization of 

alphaherpesvirus S segment genes is presented in Figure 4. 
The organization of these genes in some cases vary greatly 
in overall length, organization and degree of homology. 
Nevertheless, the overall gene layouts displayed are 

30 consistent with a model to account for the divergence of 
alphaherpesviruses from a common ancestor by a number of 
homologous recombination events which result in expansion 
or contraction of the inverted repeat regions and a 
concomitant loss or gain of U s gene(s). In the case of 

35 VZV, six S segment homologs are lacking compared to HSV-1 
(US2, US4 , US5, US6, US11 , US12). Some genes, such as the 
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USl homologs, show particular sequence and length 
divergences* Compared to HSV-1, the MDV, VZV and EHV-4 US1 
homologs lack approximately 120 aa of sequence comparable 
to the 5 f -proximal portion of HSV-1 USl (alpha 22) Based 
on Northern blot analysis, Si nuclease protection analysis 
5 and phosphonoacetic acid inhibition studies r in contrast to 
its relatively uncharacterized immediate -early HSV-1 
counterpart , the MDV USl gene appears to be regulated as an 
abundantly expressed late class gene (data not shown), in 
contrast to the other alphaherpesviruses, MDV contains two 

10 apparently MDV-specific ORFs. Moreover, the MDV Ug region 
appears to contain approximately 2,6 to 4.0 kb of 
additional 5* -proximal sequences. Based on a comparison of 
Figure 4 and consideration of the expansion-contraction 
recombination scheme, it appears likely that there are 

15 : additional MDV-specific Ug genes. 
: r>! Since MDV has long been regarded as a 

gammaherpesvirus, much of the previous work interpreting • - 
: their properties has proceeded by analogy with the 
. association between EBV and B cells (Nonoyama, M. p. 

20 333-341. In B. Roizman (ed.), The herpesviruses, vol. 1. 

Plenum Press (1982); and Wilbur, W. J., et al., Proc. Natl. 
Acad. Sci. USA 80:726-730 (1983)). Because of a closer 
genetic relationship to the alphaherpesviruses, and keeping 
in mind the analysis of HHV-6 above, we agree with Lawrence 

25 et al. (Lawrence, G. L. , et al., J. Virol. 64:287-299 

(1990)) that the lymphotropic properties of MDV and HVT are 
unlikely to be determined by molecules homologous to EBV 
and that a delineation of molecular differences between MDV 
and the neurotropic alphaherpesviruses would be more 

30 fruitful in expli ning the observed biological differences 
than employing analogies based on properties of 
gammaherpesvi ruses such as EBV and HVS. 

To account for such differences, the MDV Ug 
region may be particularly important. With few exceptions, 

35 each HSV-1 L component gene possesses an equivalent in VZV 
(McGeoch, D. J., et al., J. Gen. Virol. 69:1531-1574 
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(1988)); a considerable number of these are related to 
beta- and gammaherpesvirus genes as well (29 of 67 EBV 
counterparts to VZV U L genes; Davison, A. J., et al., J. 
Gen. Virol. 68:1067-1079 (1987)). in contrast, the S 
segments of HSV-1 and VZV differ significantly in size and 
appear to be among the least related parts of the two 
genomes (Davison, A. J., et al., J. Gen. Virol. 67:597-611 
(1986; and Davison, A. J., et al., J. Gen. Virol. 
64:1927-1942 (1983)). Recent studies have shown that 11 of 
12 open reading frames contained in the HSV-1 S component 
are dispensable for growth in cell culture (Longnecker, R. , 
et al., Proc. Natl. Acad. Sci. USA 84:4303-4307 (1987); and 
Weber, P. C, et al.. Science 236:576-579 (1987)). The 
maintenance and evolution of such a dispensable gene 
cluster suggests the presence of functions relevant to the 
15 viruses survival in its specific ecological niche in the 

natural or laboratory animal host, rather than the presence 
of functions necessary for replication (Longnecker, R., et 
al., Proc. Natl. Acad. Sci. USA 84:4303-4307 (1987); and 
Weber, P. C, et al., Science 236:576-579 (1987)). 
20 Consistent with such a hypothesis are findings that HSV 
mutants carrying different S component gene-specific 
deletions were significantly less pathogenic and exhibited 
a reduced capacity for latency establishment in mice 
(Meignier, B., et al.. Virology 162:251-254 (1988)). In 
25 regard to the latter, there is evidence suggesting that 

RNA transcribed from the HSV U s region may be involved in 
the establishment and maintenance of an in vitro latency 
system employing human fetus lung fibroblast cells (Scheck, 
A. C, et al., Intervirology 30:121-136 (1989)). Taken 
together, the above evidence suggest(s) potentially 
important role(s) for MDV's U s genes in tissue tropism, 
latency, and/or induction of cell transformation. 

A consideration of the three gD, gl and gE 
homologs identified in this invention raises two other 
35 questions of relevance to future vaccine development. The 
11 HSV-1 U s region genes dispensible for growth in tissue 
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culture described above include HSV-1 US7 (gl) and US8 (gE) 
(Longnecker f R., et al., Proc . Natl. Acad. Sci. USA 
84:4303-4307 (1987); and Weber, P. c, et al., Science 
236:576-579 (1987)). Assuming the MDV homologs have the 
same properties f these genes may be useful as sites for 
5 insertion of foreign genes. Further the same two MDV 
homologs, and especially US 8 (gE) r may very likely be 
involved in the pathogenicity -related issues introduced 
above. Specifically HSV's gE seem to play a role in 
HSV-1' s ability to establish lethal infections and latency 

10 in mice (Meignier, B., et al. f Virology 162:251-254 (1988)). 
Further, the gl and gE homologs of PRV of swine play a 
clear role in PRV virulence for 1-day-old chickens and 
young pigs (Mettenleiter, Thomas C, et al. r Journal of 
Virology , p. 4030-4032 (Dec. 1987)). Assuming the same 
-15 holds true for the MDV US7 (gl) and US8 (gE) homologs, it 
may be possible to inactivate one or both of these genes 
from very virulent MDV isolates which cause outbreaks not 
prevented by current vaccines, and thereby creating an 
attenuated vaccine viruses more closely related to field 

20 virus causing disease outbreaks. 

A further consideration of the three (gD, gl and 
gE) homologs identified in this invention raises another 
interesting question. Fully enveloped infectious MDV 
virions are only known to be produced in feather follicle 

25 epithelial cells (Payne, L. N. p. 347-431. In B. Roizman 
(ed.), The herpesviruses, vol. 1. Plenum Press (1982)). 
Because of this, MDV studies have had to rely on limited 
fibroblast cell cultures which only promote the spread of 
cell-associated infections iri vitro . Over the last 20 

30 years, studies aimed at identifying immunogenic surface 
antigens have relied on this in vitro culture system and 
altogether only two glycoprotein antigens (A antigen/gC 
homolog; B antigen) have been routinely identified and 
characterized (Binns, M. M. , et al. f Virus Res. 12:371-382 

35 (1989); Coussens, P. M. , et al., J. Virol- 52:2373-2379 

(1988); Isfort, R. J. f et al., J. Virol. 59:411-419 (1986); 
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isfort, R. J- , et al., J. Virol. 57:464-474 (1986); and 
Sithole, I., et al., J. Virol. 62:4270-4279 (1988)). This 
is despite findings of three MDV gD, gl and gE homologs of 
the present invention and two additional glycoprotein 
homologs (gB and gH, Buckmaster, A. E. , et al., J. Gen. 
Virol. 69:2033-2042 (1988); and Ross, L. J. N., et al., J.. 
Gen. Virol. 70:1789-1804 (1989)). While immune chicken 
sera (ICS) from naturally infected birds is likely to react 
with many, if not all, MDV-encoded surface antigens, this 
complex polyclonal sera would only be useful to the extent 
that antigen expression/processing in semi -productive cell 
culture resembles that in feather follicle epithelial cells. 
Northern blot analysis using MDV gD-specif ic probes 
suggests that MDV gD mRNA is either not expressed or poorly 
expressed in DEF cells at a time when extensive cytopathic 
15 effects are observed (data not shown). In light of the 
fact that VZV lacks a gD homology and is strongly 
cell-associated, it will be interesting to see whether the 
block in MDV virion formation in primary avian fibroblast 
cells is found to correlate with lack of expression (in 
20 these cells) of a glycoprotein, such as gD, and/or some 
other S component gene(s). 

Because the protection against MD conferred by 
attenuated MDV strains (serotype 2) or HVT (serotype 3) 
appears to have an immunological basis, there is 
25 considerable interest in identifying common antigens. In 

view of this invention identifying seven MDV U s homologs to 
U S genes of HSV (the latter of which is clearly less 
related to MDV than HVT is), it would be surprising if the 
previous report showing lack of homology between MDV -HVT U s 
regions (Igarashi, T. , et al., Virology 157:351-358 (1987)) 
were proven correct. Such negative results may reflect the 
limitations regarding homology estimates based on 
hybridization, rather than sequence analysis studies. 

Example 2 shows the molecular cloning of a 
35 construct containing the DNA encoding the complete MDV US7 
(gl) gene and part of the MDV US8 (gE) gene. As can be 
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seen, this is accomplished using segments of DNA spanning 
the gl and part of the gE coding region. 

Example 2 

MOLECULAR CLONING OF A CONSTRUCT CONTAINING THE DNA 
ENCODING THE COMPLETE MDV US7 (gl) and PART OF MDV US8 (gE) 
5 Construction of a recombinant clone 

(pKS-MDgIl.59) containing the complete MDV US7 (gl) coding 
sequence and a portion of the MDV US8 (gE) coding sequence 
requires two preexisting MDV clones, pKS-MDgDl.75 and pl9Pl 
(Fig. 5). pKS-MDgDl.75 is a recombinant plasmid containing 

10 the 1.75 kbp Ncol-Sstll subfragment of MDV EcoRl-I ligated 
into the Smal-Sst II site of the cloning vector, 
pBluescript KS-. This clone contains the complete MDV US6 
(gD) coding sequence and additional sequences at the 3 1 end 
which code for the first 39 amino acids (aa) of MDV gl. 

15 pl9Pl is a recombinant plasmid containing the 1.5 kbp 

BaraHI-Pi subf ragment ;r 6f MDV cloned into the unique BamHI 
site of pUCl9. This 'Clone contains the entire MDV gl 
coding sequence, except for the first 9 aa of its signal 
sequence. In additioni; at the 3* end, pl9Pl contains the 

20 first 104 aa of the MDV US8 (gE) coding region. 

To generate pKS-MDgll. 59 , pKS-MgDl.75 is first 
cut with Hindi , which cuts once in the multiple cloning 
site of the pBluescript vector and once about 180 bp 
upstream of the insert's Sstll terminus. This results in 

25 two fragments: one fragment (1.6 kbp) consists primarily of 
insert sequences encoding MDV US6 (gD) ; the larger fragment 
(3.1 kbp) consists of pBluescript vector sequences, in 
addition to about 180 bp which encode the N-terminus of MDV 
gl. The 3.1 kb fragment is gel purified and self -ligated 

30 by way of the two Hindi ends. The resulting recombinant 
plasmid, pKS-MDgl0.18 f is then cut with SstI (in the 
multiple cloning site, just downstream of the Sstll site). 
Prior to subsequent digestion with Sstll, the cohesive SstI 
ends is made blunt-ended with T4 DNA polymerase. The 

35 resulting 3.1 kbp Sstll-SstI (blunt) fragment of pMDgl0.18 
is gel purified and used in the final ligation step to 
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create pKS-MDgIl.59. While the enzymatic manipulations of 
pKS-MDgDl.75 and pKS-MDgl0.18 are taking place, pl9Pl is 
cut with Hindlll, which cuts just downstream of the partial 
MDV DS8 (gE) coding sequence in the multiple cloning site 
of pUCl9- Prior to digestion with Sstlld, the cohesive 
Hindlll ends is made blunt -ended using Klenow fragment. 
The smaller Sstll-Hindlll (blunt) fragment (1.4 kbp) 
contains a majority of the MDV US7 (gi) coding sequence, in 
addition to 312 nucleotides at the 3' end which code for 
the 5' end of MDV gE. This 1.4 kbp Sstll-HindllK blunt) 
fragment is gel purified and ligated to the 3.1 kbp 
Sstll-SstK blunt) fragment of pKS-MDgD0.18. The resulting 
recombinant, pKS-MDgIl.59, contains the complete coding 
sequence for MDV gi and a portion of the N-terminal gE 
coding sequence. Digestion of pKS-MDgIl.59 with Kpnl 
yields two fragments; the smaller 1.15 kbp fragment 
SRP^ ins the complete coding sequence for MDV gi. 

/» 5 ; • r Example 3 shows molecular subcloning of a 

construct containing the complete MDV US 8 (gE) gene. 

. r , ■ . Example 3 

MOLECULAR CLONING OF A CONSTRUCT ENCODING THE COMPLETE MDV 
US8 (gE) 

Construction of a recombinant clone 
(pl8-MDgE2.53) containing the complete MDV US8 (gE) coding 
sequence requires a clone other than the BamHI or EcoRl 
clones used previously. GA strain clone GA-02, an EMBL-3 
clone containing a partially digested MDV Sail insert, 
which contains BamHI -A, -Pi, and additional 5* and 3* 
flanking sequences (kindly provided by P. Sondermeier, 
Intervet Intl. B. V. , Boxmeer, The Netherlands) was used to 
extend analysis 3' of the EcoRl-i and BamHl-Pl fragments. 
Smaller Sail subfragments located at the 3' end of this 
phage clones MDV insert were gel purified and ligated to 
PUC18 linearized to Sail (pSP18-A, pSPl8-B, and P SP18-C, 
Fig. IB). The pUCl8 subclone, pSPl8-A contains the entire 
MDV US 8 (gE) coding sequence and is designated pl8-MDgE2.53 
for ATCC deposit purposes. 
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Index of definition of letters in Figure 2. 
Table 4 showing the amino acids with both their single 
letter and three letter symbols. 

TABLE 4 


A 

Ala 

Alanine 

M 

Met 

Methionine 

C 

Cys 

Cysteine 

N 

Asn 

Asparagine 

D 

Asp 

Aspartic Acid 

P 

Pro 

Proline 

E 

Glu 

Glutamic Acid 

Q 

Gin 

Glutamine 

F 

Phe 

Phenylalanine 

R 

Arg 

Arginine 

G 

Gly 

Glycine 

S 

Ser 

Serine 

H 

His 

Histidine 

T 

Thr 

Threonine 

I 

He 

Isoleucine 

V 

Val 

Valine 

K 

Lys 

Lysine 

W 

Trp 

Tryptophan 

L 

Leu 

Leucine 

Y 

Tyr 

Tyrosine 


15 When the DNA segments encoding glycoproteins gl 

and gE are altered by insertional, site-directed or r 
■'*!•> deletion mutagenesis, the pathogenicity of the MDV may be 

reduced. Also, the segments of DNA encoding the 

non-essential gl and gE can be used as insertion sites for 
20 ii segments of foreign DNA which encode proteins that are" 

antigenically active for the purpose of producing a 

recombinant vaccine, 

ATCC Deposit 

The gene for MDV US 6 (MDV gD) has been deposited 
25 in a plasmid (phagemid) pKS-MDgDl.75, as ATCC 40855, with 

The American Type Culture Collection, Rockville, MD, 20852, 

USA. 

The gene for MDV US7 (MDV gl) has been deposited 
in a plasmid (phagemid) pKS-MDgIl.59, as ATCC 75040, with 
30 The American Type Culture Collection, Rockville, MD, 20852, 
USA. 

The gene for MDV US8 (MDV gE) has been deposited 
in a plasmid pl8-MDgE 2.53, as ATCC 75039, with The 
American Type Culture Collection, Rockville, MD, 20852, 
35 USA. 

Attached are Sequence Listings for Sequence ID 
NOS. 1, 2 and 3 as previously described in the application. 
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(1) GENERAL INFORMATION: 

(i) Applicants: Leland F. Velicer, Peter Brunovskis, 

and Paul Coussens 

(ii) Title of Invention: Marek's Disease Herpesvirus 

DNA Segment Encoding 
Glycoproteins gD, gl and gE 

(iii) Number of Sequences: 3 

(iv) Correspondence Address: 

(A) Addressee: Ian C. McLeod 

(B) Street: 2190 Commons Parkway 

(C) City: Okemos 

(D) State: Michigan 

(E) County; Ingham 

(F) ; Zip: 48864 

(v) Computer Readable Form: 

(A) Medium Type: 1.44 Mb 3 1/2 w floppy 

diskette 

(B) Computer: IBM PS2, Model 50 

(C) Operating System: MS-DOS 5.0 

(D) Software: PC-Write 3.02 

(viii) Attorney/Agent Information: 

(A) Name: Ian C. McLeod 

(B) Registration No. : 20,931 

(C) Reference/Docket Number: MSU 4.1-132 

(ix) Telecommunication Information: 

(A) Telephone: (517) 347-4100 

(B) Telefax: (517) 347-4103 
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(2) Information for SEQ ID NO: 1 

(i) Sequence Charateristics : 

(A) Length: 10,350 base pairs 

(B) Type: nucleic acid 

(C) Strandedness: double 

(D) Topology: linear 

(ii) Molecule Type: genomic DNA 

(iii) HYPOTHETICAL: Yes 

(v) ANTI-SENSE: No 

(vi) ORIGINAL SOURCE: 

(A) Organism: MDV, GA strain 

(vii) IMMEDIATE SOURCE: 

(A) Library: genomic 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
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GAATTCCTTG AAATTGGAGT GAAATCTTTA GGGAGGGAGG TTTACCATTG TGGAGAATAT 60 

ATAGAGCAAG TAGTACATTA GGGGCTGGGT TAAAGACCAA GTAATTTTTG ACCGGATATC 120 

ACGTGATGTA AATTCTAGCA ATTATTGTTC CTAGCAGAAG ATAAAAGCTG GTAGCTATAT 180 

AATACAGGCC AAAGTCTCCA AATTACACTT GAGCAGAAAA CCTGCTTTCG GCTCCATCGG 240 

AGGCAAC ATG AGT CGT GAT CGA GAT CGA GCC AGA CCC GAT ACA CGA TTA 289 
Met Ser Arg Asp Arg Asp Arg Ala Arg Pro Asp Thr Arg Leu 
1 5 10 

TCA TCG TCA GAT AAT GAG AGC GAC GAC GAA GAT TAT CAA CTG CCA CAT 337 
Ser Ser Ser Asp Asn Glu Ser Asp Asp Glu Asp Tyr Gin Leu Pro His 
15 20 25 30 

TCA CAT CCG GAA TAT GGC AGT GAC TCG TCC GAT CAA GAC TTT GAA CTT 385 
Ser His Pro Glu Tyr Gly Ser Asp Ser Ser Asp Gin Asp Phe Glu Leu 
35 40 45 

AAT AAT GTG GGC AAA TTT TGT CCT CTA CCA TGG AAA CCC GAT GTC GCT 433 
Asn Asn Val Gly Lys Phe Cys Pro Leu Pro Trp Lys Pro Asp Val Ala 
50 55 60 

CGG TTA TGT GCG GAT ACA AAC AAA CTA TTT CGA TGT TTT ATT CGA TGT 481 
Arg Leu Cys Ala Asp Thr Asn Lys Leu Phe Arg Cys Phe lie Arg Cys 

65 70 ' ! ! " 75 

CGA CTA AAT AGC GGT CCG TTC CAC GAT GCT CTT CGG AGA GCA CTA TTC 529 
Arg Leu Asn Ser Gly Pro Phe His Asp Ala Leu Arg Arg Ala Leu Phe 

80 85 . i 90 

GAT ATT CAT ATG ATT GGT CGA ATG GGA TAT CGA CTA AAA CAA GCC GAA 577 
Asp lie His Met He Gly Arg Met Gly Tyr Arg Leu Lys Gin Ala Glu 
95 100 105 110 

TGG GAA ACT ATC ATG AAT TTG ACC CCA CGC CAA AGT CTA CAT CTG CGC 625 
Trp Glu Thr He Met Asn Leu Thr Pro Arg Gin Ser Leu His Leu Arg 
115 120 125 

AGG ACT CTG AGG GAT GCT GAT AGT CGA AGC GCC CAT CCT ATA TCC GAT 673 
Arg Thr Leu Arg Asp Ala Asp Ser Arg Ser Ala His Pro He Ser Asp 
130 135 140 

ATA TAT GCC TCC GAT AGC ATT TTT CAC CCA ATC GCT GCG TCC TCG GGA 721 
He Tyr Ala Ser Asp Ser He Phe His Pro He Ala Ala Ser Ser Gly 
145 150 155 

ACT ATT TCT TCA GAC TGC GAT GTA AAA GGA ATG AAC GAT TTG TCG GTA 769 
Thr He Ser Ser Asp Cys Asp Val Lys Gly Met Asn Asp Leu Ser Val 
160 165 170 

GAC AGT AAA TTG CAT TAA CTATCCAGAC TTGAAGAGAA AGCTCTTATT 817 

Asp Ser Lys Leu His End 

175 
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ATATAATTTT AATTGTTAGA CATAGAGCCG ACATTCTTTG ATCTATCTAA TGAGATAAAA 877 

TAATAGATTT TGGATTTATT TGTCATGATC TGTTGCAACA AACGCTGACC CCCCCCATCC 937 

ATGAAGGGGC GTGTCAAATA ACGTGTTGCC TTTTTGTTGT ATATGAAGAT ATTTAATGTG 997 

GCGTTGAGCC TAATGAGAGG AGAACGTGTT TGAATACTGG AGACGAGCGC CGTGTAAGAT 1057 

TAAAACATAT TGGAGAGGT ATG GCC ATG TGG TCT CTA CGG CGC AAA TCT 1106 

Met Ala Met Trp Ser Leu Arg Arg Lys Ser 
15 10 

AGC AGG AGT GTG CAA CTC CGG GTA GAT TCT CCA AAA GAA CAG AGT TAT 1154 
Ser Arg Ser Val Gin Leu Arg Val Asp Ser Pro Lys Glu Gin Ser Tyr 
15 20 25 

GAT ATA CTT TCT GCC OGC GGG GAA CAT GTT GCG CTA TTG CCT AAA TCT 1202 
Asp lie Leu Ser A^ 7 Gly Glu His Val Ala Leu Leu Pro Lys Ser 
30 35 40 

GTA CGC AGT CTA GCC AGG ACC ATA TTA ACC GCC GCT ACG ATC TCC CAG 1250 
Val Arg Ser Leu Ala Arg Thr He Leu Thr Ala Ala Thr He Ser Gin 
45 50 55 

GCT GCT ATG AAA GCT GGA AAA CCA CCA TCG TCT CGT TTG TGG GGT GAG 1298 
Ala Ala Met Lys Ala Gly Lys Pro Pro Ser Ser Arg Leu Trp Gly Glu 
60 . 65 70 

ATA TTC GAC AGA ATG ACT GTC ACG CTT AAC GAA TAT GAT ATT TCT GCT 1346 
He Phe Asp Arg Met Thr Val Thr Leu Asn Glu Tyr Asp He Ser Ala 
75 80 \ 85 90 

TCG CCA TTC CAC CCG ACA GAC CCG ACG AGA AAA ATT GTA GGC CGG GCT 1394 
Ser Pro Phe His Pro Thr Asp Pro Thr Arg Lys He Val Gly Arg Ala 
95 100 105 

TTA CGG TGT ATT GAA CGT GCT CCT CTT ACA CAC GAA GAA ATG GAC ACT 1442 
Leu Arg Cys He Glu Arg Ala Pro Leu Thr His Glu Glu Met Asp Thr 
110 115 120 

CGG TTT ACT ATC ATG ATG TAT TGG TGT TGT CTT GGA CAT GCT GGA TAC 1490 
Arg Phe Thr He Met Met Tyr Trp Cys Cys Leu Gly His Ala Gly Tyr 
125 130 135 

TGT ACT GTT TCG CGC TTA TAT GAG AAG AAT GTC CGT CTT ATG GAC ATA 1538 
Cys Thr Val Ser Arg Leu Tyr Glu Lys Asn Val Arg Leu Met Asp He 
140 145 150 

GTA GGT TC GCA ACG GGC TGT GGA ATA AGT CCA CTC CCC GAA ATA GAG 1586 
Val Gly Ser Ala Thr Gly Cys Gly He r >er Pro Leu Pro Glu He Glu 
155 160 165 170 

TCT TAT TGG AAA CCT TTA TGT CGT GCC GTC GCT ACT AAG GGG AAT GCA 1634 
Ser Tyr Trp Lys Pro Leu Cys Arg Ala Val Ala Thr Lys Gly Asn Ala 
175 180 185 

GCA ATC GGT GAT GAT GCT GAA TTG GCA CAT TAT CTG ACA AAT CTT CGG 1682 
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Ala lie Gly Asp Asp Ala Glu Leu Ala His Tyr Leu Thr Asn Leu Arg 
190 195 200 

GAA TCG CCA ACA GGA GAC GGG GAA TCC TAC TTA TAA CTAATCGCAC 1728 
Glu Ser Pro Thr Gly Asp Gly Glu Ser Tyr Leu End 
205 210 

AATTATTAAT AGGATTTTAG GAAAAACTGC TACTAACGTT GTTTAAATAA TAAAATTTTA 1788 

TTTTCAATAA GGCATTACAG TGTTGTCATG ATTGTATGTA TTATATGGGG TATGCATGAG 1848 

GATTACTTCG ATTGAAACTT TGTCTAAATG TCTGTAGGAT TTTACTATTC ATTAGTCTGG 1908* 

ATCGAGGCGG ACGTAAATGG AGATTGCGGC AAATGTAGGG GTGCTGGTAC ATAAGACCTC 1968 

CAACATCCAT TCGACTCATC GGCCTGCGTC CAAATGGATA TGTTGATGTA CCTTGTAAAG 2028 

TTATGACATT AGAAGATCGA TGGTGAATAG TGGGATCTAT ATCCATGCTA TTCTCAATAT 2088 

TGCATGATAT GCAATGTTCC CGGTTAGGTT TGATAAGATC ATGTATGGTT CTATAATACA 2148 

ACTCCTCTTC AGAAGAATCA TTTATTTTAT GTCCACTGTC CTTGGATATT CCAGTTTCTG 2208 

TCAATCGATT CGCTTGCATT TGCGTGCAGC ATGTCTTGAT GGCATTTCCT ATGCTATCAT 2268 

CCGGCAGGCC TAAGGGTGTT CTATACTCGC ACACAGGTAG AGCAAGAACC ACGGCATATC 2328 

GAGCTACCTC TATTGCCCCG CTAAGGACAT TTCTTGCAGA CTGTATTGTC ATGAACATAT 2388 

TTCGTGTATT GTGTCGATCA TAACCCTTGT TGATTCCTAT GGAAAGCATT GTGGTCCAGT 2448 

TTTCCAGATG AAATGAAAAC AATGCGGGCA AAAATGGTCC CACCTGTTTC ATCTTCAATG 2508 

CATCTCTCAC ATCCCAAGTT CTATAGAATA TTCTCCACTG ACCAGTTTCG GTAAGATCAG 2568 

TTTCTGTAAA ATTTGTGATA GTTTCAATCG AAAACATTTT GTCCATCATG GCAAAAAATC 2628 

TATAGGCAGA CCAGATAACC ATTTGACACC ACATATCCTT GTGTATATCA AACGATGTAA 2688 

TAGATCCCTC GTTAGTAGAT ATGGTACATA AAAGGCCTAA TCTCTCTCGG GCTTCCATAC 2748 

ATTGAACGAT TCCTTCTGTG AATTCATCAA CAACCACATG CCAAAAATTT ACATTAGTAA 2808 

TCTTTCTCGG TGGCTTACCA AATCGTCCTC TTGGTATATC CATATCATCG AACATTGTAG 2868 

CATTGACTCT GCTCATCGTT GTCTTTCAAA TGCGCTCGAT TGTTGAATCT CTCCTGATGT 2928 

TAGAAGTATA TGGAAGATAG CCTGGATACA TAAGTGATCT AGAAGGGTTT GTTATTGCAC 2988 

TAATATACAA ATTATACGTG ACACTATAGC GACGGTTGTA GCGATGCACC TAATCGTAAT 3048 

GTGTATACGC CCCATCATGT AATTATATCT AATTGGTAGC AAGTAGGTCT GTCGAATAAC 3108 

AGCTAATGAC TACCGGCTCT ACATTTTTTC TGTATTCGTG ACTTTCCTGT CGCAGTGTAA 3168 

CGAACCGGAA TTGCAATCGC ATCTCTATCT TCTTTCTTGC AACATTTTCC ACAACAGAAT 3228 
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AATCTGCCGG GTGTACTACT CATTTGAGGT GGTTCGATTT CCGGAGGTTT TAGAGGATTG 3288 

GGTGGGGACC CGAGGATTTT GTATACACAT ACCATATCAC TGTCGCAAAA ATGCGCTCTA 3348 

TCTTCTGGGG TGTCGAACTT CGGTTCCCAT GTAGATGTCA AGAGAGTTTG AATATTGTCG 3408 

GGAATGGCCC ACGGCATACC GGACCAGGTC CCAGACACTT TGATTGCAAG TAACCTTTTT 3468 

GGCAAAGGAA TACATTCGAG CGCAATGGCA CATATATCTG CCGCCCCAAC TATCCACAAG 3528 

CTATGTGGAG CATTACCAGA AACTTCAGAT TCCAACATCA AATATCCAGA TAGAACATCC 3588 

TGCCATTCTG TGGAACATCC TGCAACATCT TCAAATAGCC GCACTATAAA CGAATCCCTA 3648 

GTTCCGGCCA ATCCGGTACC ACGAACTCCA GTTCCATCTG GTGGCTTTGT CCTTACTATC 3708 

GGTCGATGTT GCCGAGGAAG AATTAACATG GGTTTGGCAA AACGGAATAG GTCTGCAGCT 3768 

CTGGCGATTA TGGGCACACC CACATCATCC TGTATTTGTT CCATACATTG CTTTATAAGG 3828 

AATATCCATA AAGTAGATGC AGCATCTCTA GATCTTCCTG GCAATCGATC GCATTCATCT 3888 

AGAAGTGTGA CTATAGTTAT CATGGACACA CCCATCTTCA CCTCCACCAA TAATCTTTTT 3948 

/ : : TATTGTTAAT AACTGGGCCG GTCTGATCTC CAAATCTTAT ACTCTGGTAG AATATGAAAC 4008 

AGGGTTAAAA CTAGGTAATA GACTGGATGT CTTCGACTCC GGAGGCAGAA ACG ATG .; 4064 

Met 

. ' i- .' i 

GAA TGT GGC ATT TCT TCG TCG AAA GTA CAC GAC TCT AAA ACT AAT ACT 4112 
Glu Cys Gly lie Ser Ser Ser Lys Val His Asp Ser Lys Thr Asn Thr 
5 10 15 

ACC TAC GGA ATT ATA CAT AAC AGC ATC AAT GGT ACG GAT ACG ACG TTG 4160 
Thr Tyr Gly lie lie His Asn Ser lie Asn Gly Thr Asp Thr Thr Leu 
20 25 30 

TTT GAT ACT TTT CCC GAC AGT ACC GAT AAC GCG GAA GTG ACG GGG GAT 4208 
Phe Asp Thr Phe Pro Asp Ser Thr Asp Asn Ala Glu Val Thr Gly Asp 
35 40 45 

GTG GAC GAT GTG AAG ACT GAG AGC TCT CCC GAG TCC CAA TCT GAA GAT 4256 
Val Asp Asp Val Lys Thr Glu Ser Ser Pro Glu Ser Gin Ser Glu Asp 
50 55 60 65 

TTG TCA CCT TTT GGG AAC GAT GGA AAT GAA TCC CCC GAA ACG GTG ACG 4304 
Leu Ser Pro Phe Gly Asn Asp Gly Asn Glu Ser Pro Glu Thr Val Thr 
70 75 80 

GAC ATT GAT GCA GTT TCA GCT GTG CGA ATG CAG TAT AAC ATT GTT TCA 4352 
Asp lie Asp Ala Val Ser Ala Val Arg Met Gin Tyr Asn lie Val Ser 
85 90 95 

TCG TTA CCG CCC GGA TCT GAA GGG TAT ATC TAT GTT TGT ACA AAG CGT 4400 
Ser Leu Pro Pro Gly Ser Glu Gly Tyr He Tyr Val Cys Thr Lys Arg 
100 105 110 
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GGG GAT AAT ACC AAG AGA AAA GTC ATT GTG AAA GCT GTG ACT GGT GGC 4448 

Gly Asp Asn Thr Lys Arg Lys Val He Val Lys Ala Val Thr Gly Gly 
115 120 125 

AAA ACC CTT GGG AGT GAA ATT GAT ATA TT A AAA AAA ATG TCT CAC CGC 4496 
Lys Thr Leu Gly Ser Glu He Asp He Leu Lys Lys Met Ser His Arg 
130 135 1^0 145 

TCC ATA ATT AGA TTA GTT CAT GCT TAT AGA TGG AAA TCG ACA GTT TGT 4544 
Ser He He Arg Leu Val His Ala Tyr Arg Trp Lys Ser Thr Val Cys 
150 155 160 

ATG GTA ATG CCT AAA TAC AAA TGC GAC TTG TTT ACG TAC ATA GAT ATC 4592 
Met Val Met Pro Lys Tyr Lys Cys Asp Leu Phe Thr Tyr He Asp He 
.. 165 170 175 

ATG GGA CCA TTG CCA CTA AAT CAA ATA ATT ACG ATA GAA CGG GGT TTG 4640 
Met Gly Pro Leu Pro Leu Asn Gin He He Thr He Glu Arg Gly Leu 
180 185 190 

CTT GGA GCA TTG GCA TAT ATC CAC GAA AAG GGT ATA ATA CAT CGT GAT 4688 
Leu Gly Ala Leu Ala Tyr He His Glu Lys Gly He He His Arg Asp 
195 200 205 

GTA AAA ACT GAA AAT ATA TTT TTG GAT AAA CCT GAA AAT GTA GTA TTG 4736 
Val Lys Thr Glu Asn lie Phe Leu Asp Lys Pro Glu Asn Val Val Leu 
210 215 220 . 225 

GGG GAC TTT GGG GCA GCA TGT AAA TTA GAT GAA CAT ACA GAT AAA CCC 4784 
Gly Asp Phe Gly Ala Ala Cys Lys Leu Asp Glu His Thr Asp Lys Pro 
230 235 240 

AAA TGT TAT GGA TGG AGT GGA ACT CTG GAA ACC AAT TCG CCT GAA CTG 4832 
Lys Cys Tyr Gly Trp Ser Gly Thr Leu Glu Thr Asn Ser Pro Glu Leu 
245 250 255 

CTT GCA CTT GAT CCA TAC TGT ACA AAA ACT GAT ATA TGG AGT GCA GGA 4880 
Leu Ala Leu Asp Pro Tyr Cys Thr Lys Thr Asp He Trp Ser Ala Gly 
260 265 270 

TTA GTT CTG TTT GAG ATG TCA GTA AAA AAT ATA ACC TTT TTT GGC AAA 4928 
Leu Val Leu Phe Glu Met Ser Val Lys Asn He Thr Phe Phe Gly Lys 
275 280 285 

CAA GTA AAC GGC TCA GGT TCT CAG CTG AGA TCC ATA ATT AGA TGC CTG 4976 
Gin Val Asn Gly Ser Gly Ser Gin Leu Arg Ser He He Arg Cys Leu 
290 295 300 305 

CAA GTC CAT CCG TTG GAA TTT CCA CAG AAC AAT TCT ACA AAC TTA TGC 5024 
Gin Val His Pro Leu Glu Phe Pro Gin Asn Asn Ser Thr Asn Leu Cys 
310 315 320 

AAA CAC TTC AAG CAG TAC GCG ATT CAG TTA CGA CAT CCA TAT GCA ATC 5072 
Lys His Phe Lys Gin Tyr Ala He Gin Leu Arg His Pro Tyr Ala He 
325 . 330 335 


CCT CAG ATT ATA CGA AAG AGT GGT ATG ACG ATG GAT CTT GAA TAT GCT 


5120 
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Pro Gin lie lie Arg Lys Ser Gly Met Thr Met Asp Leu Glu Tyr Ala 
340 345 350 

ATT GCA AAA ATG CTC ACA TTC GAT CAG GAG TTT AGA CCA TCT GCC CAA 5168 
lie Ala Lys Met Leu Thr Phe Asp Gin Glu Phe Arg Pro Ser Ala Gin 
355 360 365 

GAT ATT TTA ATG TTG CCT CTT TTT ACT AAA GAA CCC GCT GAC GCA TTA 5216 
Asp lie Leu Met Leu Pro Leu Phe T 1 *r Lys Glu Pro Ala Asp Ala Leu 
370 375 380 385 

TAC ACG ATA ACT GCC GCT CAT ATG TAA ACACCCGTCA AAAATAACTT 5263 
Tyr Thr He Thr Ala Ala His Met End 
390 

CAATGATTCA TTTTATAATA TATACTACGC GTTACCTGCA ATAATGACAA CATTCGAAGT 5323 

CTTTGAAGAT TCGCAGACCT TTTTTGCGA ATG GCA CCT TCG GGA CCT ACG CCA 5376 

Met Ala Pro Ser Gly Pro Thr Pro 
1 5 

TAT TCC CAC AGA CCG CAA ATA AAG CAT TAT GGA ACA TTT TCG GAT TGC 5424 
Tyr Ser His Arg Pro Gin He Lys His Tyr Gly Thr Phe Ser Asp Cys 
10 15 20 

ATG AGA TAT ACT CTA Ax. GAT GAG AGT AAG CTA GAT GAT AGA TGT TCA 5472 
Met Arg Tyr Thr Leu Asn Asp Glu Ser Lys Val Asp Asp Arg Cys Ser 
25 30 35 40 

GAC ATA CAT AAC TCC TTA GCA CAA TCC AAT GTT ACT TCA AGC ATG TCT 5520 
Asp He His Asn Ser Leu Ala Gin Ser Asn Val Thr Ser Ser Met Ser 
45 50 55 

GTA ATG AAC GAT TCG GAA GAA TGT CCA TTA ATA AAT GGA CCT TCG ATG 5568 
Val Met Asn Asp Ser Glu Glu Cys Pro Leu He Asn Gly Pro Ser Met 
60 65 70 

CAG GCA GAG GAC CCT A - A AGT GTT TTT TAT AAA GTT CGT AAG CCT GAC 5616 
Gin Ala Glu Asp Pro Lys Ser Val Phe Tyr Lys Val Arg Lys Pro Asp 
75 80 85 

GGA AGT CGT GAT TTT TCA TGG CAA AAT CTG AAC TCC CAT GGC AAT AGT 5664 
Arg Ser Arg Asp Phe Ser Trp Gin Asn Leu Asn Ser His Gly Asn Ser 
90 95 100 

GGT CTA CGT CGT GAA AAA TAT ATA CGT TCC TCT AAG AGG CGA TGG AAG 5712 
Gly Leu Arg Arg Glu Lys Tyr I"-. Arg Ser Ser Lys Arg Arg Trp Lys 
105 110 115 120 

AAT CCC GAG ATA TTT AAG GTA TCT TTG AAA TGT GAA TCA ATT GGC GCT 5760 
Asn Pro Glu He Phe Lys Val Ser Leu Lys Cys Glu Ser He Gly Ala 
125 130 135 

GGT AAC GGA ATA AAA ATT TCA TTC TCA TTT TTC TAA CATTATAATA 5806 
Gly Asn Gly He Lys He »Ser Phe Ser Phe Phe End 
140 145 
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TATCAGATCG TTTCTTATAT ACTTATTTTC ATCGTCGGGA TATGACTAAC GTATACTAAG 5866 

TTACAAGAAA CAACTGCTTA ACGTCGAACA TAACGGAAAT AAAAATATAT ATAGCGTCTC 5926 

CTATAACTGT TATATTGGCA CCTTTTAGAG CTTCGGT ATG AAT AGA TAC AGA TAT 5981 

Met Asn Arg Tyr Arg Tyr 
-30 -25 

GAA AGT ATT TTT TTT AGA TAT ATC TCA TCC ACG AGA ATG ATT CTT ATA 6029 
Glu Ser lie Phe Phe Arg Tyr He Ser Ser Thr Arg Met He Leu He 
-20 -15 .10 

ATC TGT TTA CTT TTG GGA ACT GGG GAC ATG TCC GCA ATG GGA CTT AAG 6077 
He Cys Leu Leu Leu Gly Thr Gly Asp Met Ser Ala Met Gly Leu Lys 
-5 1 5 

AAA GAC AAT TCT CCG ATC ATT CCC ACA TTA CAT CCG AAA GGT AAT GAA 6125 
Lys Asp Asn Ser Pro He He Pro Thr Leu His Pro Lys Gly Asn Glu 
10 15 20 

AAC CTC CGG GCT ACT CTC AAT GAA TAC AAA ATC CCG TCT CCA CTG TTT 6173 
Asn Leu Arg Ala Thr Leu Asn Glu Tyr Lys He Pro Ser Pro Leu Phe 
25 30 35 40 

GAT ACA CTT GAC AAT TCA TAT GAG ACA AAA CAC GTA ATA TAT ACG GAT 6221 
Asp Thr Leu Asp Asn: Ser . Tyr Glu Thr Lys His Val lie Tyr Thr Asp 

45 : . ), 50 55 

AAT TGT AGT TTT GCT GTT TTG AAT CCA TTT GGC GAT CCG AAA TAT ACG 6269 
Asn Cys Ser Phe Ala- Val Leu Asn Pro Phe Gly Asp Pro Lys Tyr Thr 

60 ; ,65 70 

CTT CTC AGT TTA CTG TTG ATG GGA CGA CGC AAA TAT GAT GCT CTA GTA 6317 
Leu Leu Ser Leu Leu Leu Met Gly Arg Arg Lys Tyr Asp Ala Leu Val 
75 80 85 

GCA TGG TTT GTC TTG GGC AGA GCA TGT GGG AGA CCA ATT TAT TTA CGT 6365 
Ala Trp Phe Val Leu Gly Arg Ala Cys Gly Arg Pro He Tyr Leu Are 
90 95 100 

GAA TAT GCC AAC TGC TCT ACT AAT GAA CCA TTT GGA ACT TGT AAA TTA 6413 
Glu Tyr Ala Asn Cys Ser Thr Asn Glu Pro Phe Gly Thr Cys Lys Leu 
105 HO us 120 

AAG TCC CTA GGA TGG TGG GAT AGA AGA TAT GCA ATG ACG AGT TAT ATC 6461 
Lys Ser Leu Gly Trp Trp Asp Arg Arg Tyr Ala Met Thr Ser Tyr He 
125 130 135 

GAT CGA GAT GAA TTG AAA TTG ATT ATT GCA GCA CCC AGT CGT GAG CTA 6509 
Asp Arg Asp Glu Leu Lys Leu He He Ala Ala Pro Ser Arg Glu Leu 
1^0 145 150 

AGT GGA TTA TAT ACG CGT TTA ATA ATT ATT AAT GGA GAA CCC ATT TCG 6557 
Ser Gly Leu Tyr Thr Arg Leu He He He Asn Gly Glu Pro He Ser 
155 • 160 165 

AGT GAC ATA TTA CTG ACT GTT AAA GGA ACA TGT AGT TTT TCG AGA CGG 6605 
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Ser Asp lie Leu Leu Thr Val Lys Gly Thr Cys Ser Phe Ser Arg Arg 
170 175 180 

GGG ATA AAG GAT AAC AAA CTA TGC AAA CCG TTC AGT TTT TTT GTC AAT 6653 
Gly lie Lys Asp Asn Lys Leu Cys Lys Pro Phe Ser Phe Phe Val Asn 
185 190 195 200 

GGT ACA ACA CGG CTG TTA GAC ATG GTG CGA ACA GGA ACC CCG AGA GCC 6701 
Gly Thr Thr Arg Leu Leu Asp Met Val Arg Thr Gly Thr Pro Arg Ala 
205 210 215 

CAT GAA GAA AAT GTG AAG CAG TGG CTT GAA CGA AAT GGT GGT AAA CAT 6749 
His Glu Glu Asn Val Lys Gin Trp Leu Glu Arg Asn Gly Gly Lys His 
220 225 230 

CTA CCA ATC GTC GTC GAA ACA TCT ATG CAA CAA GTC TCA AAT TTG CCG 6797 
Leu Pro lie Val Val Glu Thr Ser Met Gin Gin Val Ser Asn Leu Pro 
235 240 245 

AGA AGT TTT AGA GAT TCA TAT TTA AAA TCA CCT GAC GAC GAT AAA TAT 6845 
Arg Ser Phe Arg Asp Ser Tyr Leu Lys Ser Pro Asp Asp Asp Lys Tyr 
250 255 260 

AAT GAC GTC AAA ATG ACA TCG GCC ACT ACT AAT AAC ATT ACC ACC TCC 6893 
Asn Asp Val Lys Met Thr Ser Ala Thr Thr Asn Asn He Thr Thr Ser 
265 V - 270 275 280 

GTG GAT GGT TAC ACT GGA CTC ACT AAT CGG CCC GAG GAC TTT GAG AAA 6941 
Val Asp Gly Tyr Thr Gly Leu Thr Asn Arg Pro Glu Asp Phe Glu Lys 

V N 285 290 295 

GCA CCA TAC ATA ACT AAA CGA CCG ATA ATC TCT GTC GAG GAG GCA TCC 6989 
Ala Pro Tyr He Thr Lys Arg Pro He He Ser Val Glu Glu Ala Ser 
300 305 310 

AGT CAA TCA CCT AAA ATA TCA ACA GAA AAA AAA TCC CGA ACG CAA ATA 7037 
Ser Gin Ser Pro Lys He Ser Thr Glu Lys Lys Ser Arg Thr Gin He 
315 320 325 

ATA ATT TCA CTA GTT GTT CTA TGC GTC ATG TTT TGT TTC ATT GTA ATC 7085 
He He Ser Leu Val Val Leu Cys Val Met Phe Cys Phe He Val He 
330 335 340 

GGG TCT GGT ATA TGG ATC CTT CGC AAA CAC CGC AAA ACG GTG ATG TAT 7133 
Gly Ser Gly He Trp lie Leu Arg Lys His Arg Lys Thr Val Met Tyr 
345 350 355 360 

GAT AGA CGT CGT CCA TCA AGA CGG GCA TAT TCC CGC CTA TAA 7175 
Asp Arg Arg Arg Pro Ser Arg Arg Ala Tyr Ser Arg Leu End 
365 370 

CACGTGTTTG GTATGGGCGT GTCGCTATAG TGC AT AAG AA GTTGACTACA TTGATCAATG 7235 

ACATTATATA GCTTCTTTGG TCAGATAGAC GGCGTGTGTG ATTGCG ATG TAT GTA 7290 

Met Tyr Val 

CTA CAA TTA TTA TTT TGG ATC CGC CTC TTT CGA GG-. ,TC TGG TCT ATA 7338 
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Leu Gin Leu Leu Phe Trp He Arg Leu Phe Arg Gly He Trp Ser He 
-15 -10 -5 1 

GTT TAT ACT GGA ACA TCT GTT ACG TTA TCA ACG GAC CAA TCT GCT CTT 7386 
Val Tyr Thr Gly Thr Ser Val Thr Leu Ser Thr Asp Gin Ser Ala Leu 
5 10 15 

GTT GCG TTC CGC GGA TTA GAT AAA ATG GTG AAT GTA CGC GGC CAA CTT 7434 
Val Ala Phe Arg Gly Leu Asp Lys Met Val Asn Val Arg Gly Gin Leu 
20 25 30 

TTA TTC CTG GGC GAC CAG ACT CGG ACG AGT TCT TAT ACA GGA ACG ACG 7482' 
Leu Phe Leu Gly Asp Gin Thr Arg Thr Ser Ser Tyr Thr Gly Thr Thr 
35 40 45 

GAA ATC TTG AAA TGG GAT GAA GAA TAT AAA TGC TAT TCC GTT CTA CAT 7530 
Glu He Leu Lys Trp Asp Glu Glu Tyr Lys Cys Tyr Ser Val Leu His 
50 55 60 65 

GCG ACA TCA TAT ATG GAT TGT CCT GCT ATA GAC GCC ACG GTA TTC AGA 7578 
Ala Thr Ser Tyr Met Asp Cys Pro Ala He Asp Ala Thr Val Phe Arg 
70 75 80 

GGC TGT AGA GAC GCT GTG GTA TAT GCT CAA CCT CAT GGT AGA GTA CAA 7626 
Gly Cys Arg Asp Ala Val Val Tyr Ala Gin Pro His Gly Arg Val Gin 
85 90 95 

CCT; TTT CCC GAA AAG GGA ACA TTG TTG AGA ATT GTC GAA CCC AGA GTA 7674 
Pro Phe Pro Glu Lys Gly Thr Leu Leu Arg He Val Glu Pro Arg Val 
100 105 110 

TCA GAT ACA GGC AGC TAT TAC ATA CGT GTA TCT CTC GCT GGA AGA AAT 7722 
Ser Asp Thr Gly Ser Tyr Tyr He Arg Val Ser Leu Ala Gly Arg Asn 
115 120 125 

ATG AGC GAT ATA TTT AGA ATG GTT GTT ATT ATA AGG AGT AGC AAA TCT 7770 
Met Ser Asp He Phe Arg Met Val Val He He Arg Ser Ser Lys Ser 
130 135 140 145 

TGG GCC TGT AAT CAC TCT GCT AGT TCA TTT CAG GCC CAT AAA TGT ATT 7818 
Trp Ala Cys Asn His Ser Ala Ser Ser Phe Gin Ala His Lys Cys He 
150 155 160 

CGC TAT GTC GAC CGT ATG GCC TTT GAA AAT TAT CTG ATT GGA CAT GTA 7866 
Arg Tyr Val Asp Arg Met Ala Phe Glu Asn Tyr Leu He Gly His Val 
165 170 175 

GGC. AAT TTG CTG GAC AGT GAC TCG GAA TTG CAT GCA ATT TAT AAT ATT 7914 
Gly Asn Leu Leu Asp Ser Asp Ser Glu Leu His Ala He Tyr Asn He 
180 185 190 

ACT CCC CAA TCC .ATT TCC ACA GAT ATT AAT ATT GTA ACG ACT CCA TTT 7962 
Thr Pro Gin Ser He Ser Thr Asp He Asn He Val Thr Thr Pro Phe 
195 200 205 


TAC GAT AAT TCG GGA ACA ATT TAT TCA CCT ACG GTT TTT AAT TTG TTT 
Tyr Asp Asn Ser Gly Thr He Tyr Ser Pro Thr Val Phe Asn Leu Phe 


8010 
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210 


215 


220 


225 


AAT AAC AAT TCC CAT GTC GAT GCA ATG AAT TCG ACT GGT ATG TGG AAT 
Asn Asn Asn Ser H T ~ Val Asp Ala Met Asn Ser Thr Gly Met Trp Asn 
2^0 235 240 

ACC GTT TTA AAA TAT ACC CTT CCA AGG CTT ATT TAC TTT TCT ACG ATG 
Thr Val Leu Lys Tyr Thr Leu Pro Arg Leu He Tyr Phe Ser Thr Met 
245 250 255 

ATT GTA CTA TGT ATA ATA GCA TTG GCA ATT TAT TTG GTC TGT GAA AGG 
He Val Leu Cys He He Ala Leu Ala He Tyr Leu Val Cys Glu Arg 
260 265 270 

TGC CGC TCT CCC CAT CGT AGG ATA TAC ATC GGT GAA CCA AGA TCT GAT 
Cys Arg Se± Pro His Arg Arg Tie Tyr He Gly Glu Pro Arg Ser Asp 
275 280 285 

GAG GCC CCA CTC ATC ACT TCT GCA GTT AAC GAA TCA TTT CAA TAT GAT 
Glu Ala Pro Leu He Thr Ser Ala Val Asn Glu Ser Phe Gin Tyr Asp 
290 295 300 305 

TAT AAT GTA AAG GAA ACT CCT TCA GAT GTT ATT GAA AAG GAG TTG ATG 
Tvr Asn Val Lys Glu Thr Pro Ser Asp Val He Glu Lys Glu Leu Met 
310 315 320 

GAA AAA CTG AAG AAG AAA GTC GAA TTG TTG GAA AGA GAA GAA TGT GTA 
Glu Lys Leu Lys Lys Lys Val Glu Leu Leu Glu Arg Glu Glu Cys Val 
•; 325 330 335 

TAG GTTTGAGAAA CTATTATAGG TAGGTGGTAC CTGTTAGCTT AGTATAAGGG 
End 

GAGGAGCCGT TTCTTGTTTT AAAGACACGA ACACAAGGCC GTAAGTTTTA TATGTGAATT 

TTGTGCATGT CTGCGAGTCA GCGTCATA ATG TGT GTT TTC CAA ATC CTG ATA 

Met Cys Val Phe Gin He Leu He 
-15 

ATA GTG ACG ACG ATC AAA GTA GCT GGA ACG GCC AAC ATA AAT CAT ATA 
He Val Thr Thr He Lys Val Ala Gly Thr Ala Asn He Asn His He 
-10 -5 1 5 

GAC GTT CCT GCA GGA CAT" TCT GCT ACA ACG ACG ATC CCG CGA TAT CCA 
Asp Val Pro Ala Gly His Ser Ala Thr Thr Thr He Pro Arg Tyr Pro 
10 15 20 

CCA GTT GTC GAT GGG ACC FT TAC ACC GAG ACG TGG ACA TGG ATT CCC 
Pro Val Val Asp Gly Thr Leu Tyr Thr Glu Thr Trp Thr Trp He Pro 
25 30 35 

AAT CAC TGC AAC GAA ACG GCA ACA GGC TAT GTA TGT CTG GAA AGT GCT 
Asn His Cys Asn Glu Thr Ala Thr Gly Tyr Val Cys Leu Glu Ser Ala 
40 45 50 

CAC TGT TTT ACC ^AT TTG ATA TTA GGA GTA TCC TGC ATG AGG TAT GCG 
His Cys Phe Thr Asp Leu He Leu Gly Val Ser Cys Met Arg Tyr Ala 


8058 
8106 
8154 
82 

8250 
8298 
8346 _ 

8399, 

8459 
8511 

8559 

8607 

8655 

8703 

8751 
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55 60 65 70 

GAT GAA ATC GTC TTA CGA ACT GAT AAA TTT ATT GTC GAT GCG GGA TCC 8799 
Asp Glu lie Val Leu Arg Thr Asp Lys Phe lie Val Asp Ala Gly Ser 
75 80 85 

ATT AAA CAA ATA GAA TCG CTA AGT CTG AAT GGA GTT CCG AAT ATA TTC 8847 
lie Lys Gin lie Glu Ser Leu Ser Leu Asn Gly Val Pro Asn lie Phe 
90 95 100 

CTA TCT ACG AAA GCA AGT AAC AAG TTG GAG ATA CTA AAT GCT AGC CTA 8895 
Leu Ser Thr Lys Ala Ser Asn Lys Leu Glu lie Leu Asn Ala Ser Leu 
105 110 115 

CAA AAT GCG GGT ATC TAC ATT CGG TAT TCT AGA AAT GGG ACG AGG ACT 8943 
Gin Asn Ala Gly lie Tyr lie Arg Tyr Ser Arg Asn Gly Thr Arg Thr 
120 125 130 

GCA AAG CTG GAT GTT GTT GTG GTT GGC GTT TTG GGT CAA GCA AGG GAT 8991 
Ala Lys Leu Asp Val Val Val Val Gly Val Leu Gly Gin Ala Arg Asp 
135 140 145 150 

CGC CTA CCC CAA ATG TCC AGT CCT ATG ATC TCA TCC CAC GCC GAT ATC 9039 
Arg Leu Pro Gin Met Ser Ser Pro Met lie Ser Ser His Ala Asp lie 
155 160 165 

AAG TTG TCA TTA AAA AAC TTT AAA GCA TTA GTA TAT CAC GTG GGA GAT 9087 
Lys Leu Ser Leu Lys Asn Phe Lys Ala Leu Val Tyr His Val Gly Asp 
170 175 180 

ACT ATC AAT GTC TCG ACG GCG GTT ATA CTA GGA CCT TCT CCG GAG ATA 9135 
Thr lie Asn Val Ser Thr Ala Val lie Leu Gly Pro Ser Pro Glu lie 
185 190 195 

TTC ACA TTG GAA TTT AGG GTG TTG TTC CTC CGT TAT AAT CCA ACG TGC 9183 
Phe Thr Leu Glu Phe Arg Val Leu Phe Leu Arg Tyr Asn Pro Thr Cys 
200 205 210 

AAG TTC GTC ACG ATT TAT GAA CCT TGT ATA TTT CAC CCC AAA GAA CCA 9231 
Lys Phe Val Thr lie Tyr Glu Pro Cys lie Phe His Pro Lys Glu Pro 
215 220 225 230 

GAG TGT ATT ACT ACT GCA GAA CAA TCG GTA TGT CAT TTC GCA TCC AAC 9279 
Glu Cys lie Thr Thr Ala Glu Gin Ser Val Cys His Phe Ala Ser Asn 
235 240 245 

ATT GAC ATT CTG CAG ATA GCC GCC GCA CGT TCT GAA AAT TGT AGC ACA 9327 
lie Asp lie Leu Gin lie Ala Ala Ala Arg Ser Glu Asn Cys Ser Thr 
250 255 260 

GGG TAT CGT AGA TGT ATT TAT GAC ACG GCT ATC GAT GAA TCT GTG CAG 9375 
Gly Tyr Arg Arg Cys lie Tyr Asp Thr Ala lie Asp Glu Ser Val Gin 
265 270 275 

GCC AGA TTA ACA TTC ATA GAA CCA GGA ATT CCT TCC TTT AAA ATG AAA 9423 
Ala Arg Leu Thr Phe lie Glu Pro Gly lie Pro Ser Phe Lys Met Lys 
280 285 290 
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GAT GTC CAG GTA GAC GAT GCT GGA TTG TAT GTG GTT GTG GCT TTA TAC 
Asd Val Gin Val Asp Asp Ala Gly Leu Tyr Val Val Val Ala Leu Tyr 
295 300 305 310 

AAT GGA CGT CCA AGT GCA TGG ACT TAC ATT TAT TTG TCA ACG GTG GAA 
Asn Gly Arg Pro Ser Ala Trp Thr Tyr He Tyr Leu Ser Thr Val Glu 
315 320 325 

ACA TAT CTT AAT GTA TAT GAA AAC TAC CAC AAG CCG GGA TTT GGG TAT 
Thr Tyr Leu Asn Val Tyr Glu Asn Tyr His Lys Pro Gly Phe Gly Tyr 
330 335 340 


9471 


9519 


9567 


AAA TCA TTT CTA CAG AAC AGT AGT ATC GTC GAC GAA AAT GAG GCT AGC 
Lys Ser Phe Leu Gin Asn Ser Ser He Val Asp Glu Asn Glu Ala Ser 
345 350 355 

GAT TGG TCC AGC TCG TCC ATT AAA CGG AGA AAT AAT GGT ACT ATC ATT 
Asp Trp Ser Ser Ser Ser He Lys Arg Arg Asn Asn Gly Thr He He 
360 365 . 370 

TAT GAT ATT TTA CTC ACA TCG CTA TCA ATT GGG GCG ATT ATT ATC GTC 
Tvr Asp He Leu Leu Thr Ser Leu Ser He Gly Ala He He He Val 
375 380 385 390 

ATA GTA GGG GGT GTT TGT ATT GCC ATA TTA ATT AGG CGT AGG AGA CGA 
He Val Gly Gly Val Cys He Ala He Leu He Arg Arg Arg Arg Arg \ 
395 410 415 

CGT CGC ACG AGG GGG TTA TTC GAT GAA TAT CCC AAA TAT ATG ACG CTA 
Arg Arg Thr Arg Gly Leu Phe Asp Glu Tyr Pro Lys Tyr Met Thr Leu 
420 425 430 

CCA GGA AAC GAT CTG GGG GGC ATG AAT GTA CCG TAT GAT AAT ACA TGC 
Pro Gly Asn Asp Leu Gly Gly Met Asn Val Pro Tyr Asp Asn Thr Cys 
435 440 445 

TCT GGT AAC CAA GTT GAA TAT TAT CAA GAA AAG TCG GCT AAA ATG AAA 
Ser Gly Asn Gin Val Glu Tyr Tyr Gin Glu Lys Ser Ala Lys Met Lys 
450 455 460 

AGA ATG GGT TCG GGT TAT ACC GCT TGG CTA AAA AAT GAT ATG CCG AAA 
Arg Met Gly Ser Gly Tyr Thr Ala Trp Leu Lys Asn Asp Met Pro Lys 
465 470 475 480 

ATT AGG AAA CGC TTA GAT TTA TAC CAC TGA TATGTACATA TTTAAACTTA 
He Arg Lys Arg Leu Asp Leu Tyr His End 
485 

ATGGGATATA GTATATGGAC GTCTATATGA CGAGAGTAAA TAAACTGACA ATGCAAATGA 
AGCTGATCTA TATTGTGCTT TATATTGGGA CAAACCACTC GCACAAGCTC ATTCAACACA 
TCCACTCTTG CTATTAAATT CCCCATTATA TAACAATACT GACATAACAC TCATATTAAG 
GGGAGAAAAT AAATATGCAT GGCCGATCAT ATTTTATTGA GATCCGAAAA TATATCATGC 


9615 

9663 

9711 

9759 

9807 

9855 

9903 

9951 

10001 

10061 
10121 
10181 
10241 
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AAATAAGCAT GTTCTAGCAC CACTGCAACA TGTGGTTTAT CGATTTCCGG AAAGAATAGT 10301 
TGAACCATTG CCTCCGAGCA GTTGGCGATC CGTTGACCTG CAGGTCGAC 10350 
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(3) Information for SEQ ID NO: 2 

(i) Sequence Characteristics: 

(A) Length: 10,350 base pairs 

(B) Type: nucleic acid 

(C) Strandedness: double 
(P) Topology: linear 

(ii) Molecule Type: 

(A) Description: genomic DNA 

(iii) HYPOTHETICAL: Yes 

(iv) ANTI- SENSE: Yes 

(vi) ORIGINAL SOURCE: 

(A) Organism: MDV, GA strain 

(vii) IMMEDIATE SOURCE: 

(A) Library: genomic 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GTCGACCTGC AGGTCAACGG ATCGCCAACT GCTCGGAGGC AATGGTTCAA CTATTCTTTC 60 

CGGAAATCGA TAAACCACAT GTTGCAGTGG TGCTAGAACA TGCTTATTTG CATGATATAT 120 

TTTCGGATCT CAATAAAATA TGATCGGCCA TGCATATTTA TTTTCTCCCC TTAATATGAG 180 

TGTTATGTCA GTATTGTTAT ATAATGGGGA ATTTAATAGC AAGAGTGGAT GTGTTGAATG 240 

AGCTTGTGCG AGTGGTTTGT CCCAATATAA AGCACAATAT AGATCAGCTT CATTTGCATT 300 

GTCAGTTTAT TTACTCTCGT CATATAGACG TCCATATACT ATATCCCATT AAGTTTAAAT 360 

ATGTACATAT CAGTGGTATA AATCTAAGCG TTTCCTAATT TTCGGCATAT CATTTTTTAG 420 

CCAAGCGGTA TAACCCGAAC CCATTCTTTT CATTTTAGCC GACTTTTCTT GATAATATTC 480 

AACTTGGTTA CCAGAGCATG TATTATCATA CGGTACATTC ATGCCCCCCA GATCGTTTCC 540 

TGGTAGCGTC ATATATTTGG GATATTCATC GAATAACCCC CTCGTGCGAC GTCGTCTCCT 600 

ACGCCTAATT AATATGGCAA TACAAACACC CCCTACTATG ACGATAATAA TCGCCCCAAT 660 

TGATAGCGAT GTGAGTAAAA TATCATAAAT GATAGTACCA TTATTTCTCC GTTTAATGGA 720 

CGAGCTGGAC CAATCGCTAG CCTCATTTTC GTCGACGATA CTACTGTTCT GTAGAAATGA 780 

TTTATACCCA AATCCCGGCT TGTGGTAGTT TTCATATACA TTAAGATATG TTTCCACCGT 840 

TGACAAATAA ATGTAAGTCC ATGCACTTGG ACGTCCATTG TATAAAGCCA CAACCACATA 900 

CAATCCAGCA TCGTCTACCT GGACATCTTT CATTTTAAAG GAAGGAATTC CTGGTTCTAT 960 

GAATGTTAAT CTGGCCTGCA CAGATTCATC GATAGCCGTG TCATAAATAC ATCTACGATA 1020 

CCCTGTGCTA CAATTTTCAG AACGTGCGGC GGCTATCTGC AGAATGTCAA TGTTGGATGC 1080 

GAAATGACAT ACCGATTGTT CTGCAGTAGT AATACACTCT GGTTCTTTGG GGTGAAATAT 1140 
ACAAGGTTCA TAAATCGTGA CGAACTTGCA CGTTGGATTA TAACGGAGGA ACAACACCCT 1200 
AAATTCCAAT GTGAATATCT CCGGAGAAGG TCCTAGTATA ACCGCCGTCG AGACATTGAT 1260 
AGTATCTCCC ACGTGATATA CTAATGCTTT AAAGTTTTTT AATGACAACT TGATATCGGC 1320 
GTGGGATGAG ATCATAGGAC TGGACATTTG GGGTAGGCGA TCCCTTGCTT GACCCAAAAC 1380 
GCCAACCACA ACAACATCCA GCTTTGCAGT CCTCGTCCCA TTTCTAGAAT ACCGAATGTA 1440 
GATACCCGCA TTTTGTAGGC TAGCATTTAG TATCTCCAAC TTGTTACTTG CTTTCGTAGA 1500 
TAGGAATATA TTCGGAACTC CATTCAGACT TAGCGATTCT ATTTGTTTAA TGGATCCCGC 1560 
ATCGACAATA AATTTATCAG TTCjSTAAGAC GATTTCATCC GCATACCTCA TGCAGGATAC 1620 
TCCTAATATC AAATCGGTAA AACAGTGAGC ACTTTCCAGA CATACATAGC CTGTTGCCGT 1680 
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TTCGTTGCAG TGATTGGGAA TCCATGTCCA CGTCTCGGTG TAAAGGGTCC CATCGACAAC 17 AO 

TGGTGGATAT CGCGGGATCG TCGTTGTAGC AGAATGTCCT GCAGGAACGT CTATATGATT 1800 

TATGTTGGCC GTTCCAGCTA CTTTGATCGT CGTCACTATT ATCAGGATTT GGAAAACACA 1860 

CATTATGACG CTGACTCGCA GACATGCACA AAATTCACAT ATAAAACTTA CGGCCTTGTG 1920 

TTCGTGTCTT TAAAACAAGA AACGGCTCCT CCCCTTATAC TAAGCTAACA GGTACCACCT 1980 

ACCTATAATA GTTTCTCAAA CCTATACACA TTCTTCTCTT TCCAACAATT CGACTTTCTT 2040 

CTTCAGTTTT TCCATCAACT CCTTTTCAAT AACATCTGAA GGAGTTTCCT TTACATTATA 2100 ' 

ATCATATTGA AATGATTCGT TAACTGCAGA AGTGATGAGT GGGGCCTCAT CAGATCTTGG 2160 

TTCACCGATG TATATCCTAC GATGGGGAGA GCGGCACCTT TCACAGACCA AATAAATTGC 2220 

CAATGCTATT ATACATAGTA CAATCATCGT AGAAAAGTAA ATAAGCCTTG GAAGGGTATA 2280 

TTTTAAAACG GTATTCCACA TACCAGTCGA ATTCATTGCA TCGACATGGG AATTGTTATT 2340 

AAACAAATTA AAAACCGTAG GTGAATAAAT TGTTCCCGAA TTATCGTAAA ATGGAGTCGT 2400 

TACAATATTA ATATCTGTGG AAATGGATTG GGGAGTAATA TTATAAATTG CATGCAATTC 2460 

CGAGTCACTG TCCAGCAAAT TGCCTACATG TCCAATCAGA TAATTTTCAA AGGCCATACG " 2520 

GTCGACATAG CGAATACATT TATGGGCCTG AAATGAACTA GCAGAGTGAT TACAGGCCCA " 2580 

AGATTTGCTA CTCCTTATAA TAACAACCAT TCTAAATATA TCGCTCATAT TTCTTCCAGC ' 2640 

GAGAGATACA CGTATGTAAT AGCTGCCTGT ATCTGATACT CTGGGTTCGA CAATTCTCAA 2700 

CAATGTTCCC TTTTCGGGAA AAGGTTGTAC TCTACCATGA GGTTGAGCAT ATACCACAGC 2760 

GTCTCTACAG CCTCTGAATA CCGTGGCGTC TATAGCAGGA CAATCCATAT ATGATGTCGC 2820 

ATGTAGAACG GAATAGCATT TATATTCTTC ATCCCATTTC AAGATTTCCG TCGTTCCTGT 2880 

ATAAGAACTG GTCCGAGTCT GGTCGCCCAG GAATAAAAGT TGGCCGCGTA CATTCACCAT 2940 

TTTATCTAAT CCGCGGAACG CAACAAGAGC AGATTGGTCC GTTGATAACG TAACAGATGT 3000 

TCCAGTATAA ACTATAGACC AGATGCCTCG AAAGAGGCGG ATCCAAAATA ATAATTGTAG 3060 

TACATACATC GCAATCACAC ACGCCGTCTA TCTGACCAAA GAAGCTATAT AATGTCATTG 3120 

ATCAATGTAG TCAACTTCTT ATGCACTATA GCGACACGCC CATACCAAAC ACGTGTTATA 3180 

GGCGGGAATA TGCCCGTCTT GATGGACGAC GTCTATCATA CATCACCGTT TTGCGGTGTT 3240 

TGCGAAGGAT CCATATACCA GACCCGATTA CAATGAAACA AAACATGACG CATAGAACAA 3300 

CTAGTGAAAT TATTATTTGC GTTGGGGATT TTTTTTCTGT TGATATTTTA GGTGATTGAC 3360 

TGGATGCCTC CTCGACAGAG ATTATCGGTC GTTTAGTTAT GTATGGTGCT TTCTCAAAGT 3420 
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CCTCGGGCCG ATTAGTGAGT CCAGTGTAAC CATCCACGGA GGTGGTAATG TTATTAGTAG 3480 

TGGCCGATGT CATTTTGACG TCATTATATT TATCGTCGTC AGGTGATTTT AAATATGAAT 3540 

CTCTAAAACT TCTCGGCAAA TTTGAGACTT GTTGCATAGA TGTTTCGACG ACGATTGGTA 3600 

GATGTTTACC ACCATTTCGT TCAAGCCACT GCTTCACATT TTCTTCATGG GCTCTCGGGG 3660 

TTCCTGTTCG CACCATGTCT AACAGCCGTG TTGTACCATT GACAAAAAAA CTGAACGGTT 3720 

TGCATAGTTT GTTATCCTTT ATCCCCCGTC TCGAAAAACT ACATGTTCCT TTAACAGTCA 3780 

GTAATATGTC ACTCGAAATG GGTTCTCCAT TAATAATTAT TAAACGCGTA TATAATCCAC 3840 

TTAGCTCACG ACTGGGTGCT GCAATAATCA ATTTCAATTC ATCTCGATCG ATATAACTCG 3900 

TCATTGCATA TCTTCTATCC CACCATCCTA GGGACTTTAA TTTACAAGTT CCAAATGGTT 3960 

CATTAGTAGA GCAGTTGGCA TATTCACGTA AATAAATTGG TCTCCCACAT GCTCTGCCCA 4020 

AGACAAACCA TGCTACTAGA GCATCATATT TGCGTCGTCG CATCAACAGT AAACTGAGAA 4080 

GCGTATATTT CGGATCGCCA AATGGATTCA AAACAGCAAA ACTACAATTA TCCGTATATA 4140 

TTACGTGTTT TGTCTCATAT GAATTGTCAA GTGTATCAAA CAGTGGAGAC GGGATTTTGT 4200 

ATTCATTGAG AGTAGCCCGG AGGTTTTCAT TACCTTTCGG ATGTAATGTG. GGAATGATCG 4260 

GAGAATTGTC TTTCTTAAGT CCCATTGCGG ACATGTCCCC AGTTCCCAAA AGTAAACAGA 4320 

TTATAAGAAT CATTCTCGTG GATGAGATAT ATCTAAAAAA AATACTTTCA TATCTGTATC 4380 

TATTCATACC GAAGCTCTAA AAGGTGCCAA TATAACAGTT ATAGGAGACG CTATATATAT 4440 

TTTTATTTCC GTTATGTTCG ACGTTAAGCA GTTGTTTCTT GTAACTTAGT ATACGTTAGT 4500 

CATATCCCGA CGATGAAAAT AAGTATATAA GAAACGATCT GATATATTAT AATGTTAGAA 4560 

AAATGAGAAT GAAATTTTTA TTCCGTTACC AGCGCCAATT GATTCACATT TCAAAGATAC 4620 

CTTAAATATC TCGGGATTCT TCCATCGCCT CTTAGAGGAA CGTATATATT TTTCACGACG 4680 

TAGACCACTA TTGCCATGGG AGTTCAGATT TTGCCATGAA AAATCACGAC TTCGGTCAGG 4740 

CTTACGAACT TTATAAAAAA CACTTTTAGG GTCCTCTGCC TGCATCGAAG GTCCATTTAT 4800 

TAATGGACAT TCTTCCGAAT CGTTCATTAC AGACATGCTT GAAGTAACAT TGGATTGTGC 4860 

TAAGGAGTTA TGTATGTCTG AACATCTATC ATCTACCTTA CTCTCATCGT TTAGAGTATA 4920 

TCTCATGCAA TCCGAAAATG TTCCATAATG CTTTATTTGC GGTCTGTGGG AATATGGCGT 4980 

AGGTCCCGAA GGTGCCATTC GCAAAAAAGG TCTGCGAATC TTCAAAGACT TCGAATGTTG 5040 

TCATTATTGC AGGTAACGCG TAGTATATAT TATAAAATGA ATCATTGAAG TTATTTTTGA 5100 

CGGGTGTTTA CATATGAGCG GCAGTTATCG TGTATAATGC GTCAGCGGGT TCTTTAGTAA 5160 
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AAAGAGGCAA CATTAAAATA TCTTGGGCAG ATGGTCTAAA CTCCTOATCG AATGTGAG" 5220 

TTTTTGCAAT AGCATATTCA AGATCCATCG TCATACCACT CTTTCGTATA ATCTGAGGw* 5280 

TTGCATATGG ATGTCGTAAC TGAATCGCGT ACTGCTTGAA GTGTTTGCAT AAGTTTGTAG 5340 

AATTGTTCTG TGGAAATTCC AACGGATGGA CTTGCAGGCA TCTAATTATG GATCTCAGCT 5400 

GAGAACCTGA GCCGTTTACT TGTTTGCCAA AAAAGGTTAT ATTTTTTACT GACATCTCAA 5460 

ACAGAACTAA TCCTGCACTC CATATATCAG TTTTTGTACA GTATGGATCA AGTGCAAGCA 5520 

GTTCAGGCGA ATTGGTTTCC AGAGTTCCAC TCCATCCATA ACATTTGGGT TTATCTGTAT 5580 

GTTCATCTAA TTTACATGCT GCCCCAAAGT CCCCCAATAC TACATTTTCA GGTTTATCCA 5640 

AAAATATATT TTCAGTTTTT ACATCACGAT GTATTATACC CTTTTCGTGG ATATATGCCA 5700 

ATGCTCCAAG CAAACCCCGT TCTATCGTAA TTATTTGATT TAGTGGCAAT GGTCCCATGA 5760 

TATCTATGTA CGTAAACAAG TCGCATTTGT ATTTAGGCAT TACCATACAA ACTGTCGATT 5820 

TCCATCTATA AGCATGAACT AATCTAATTA TGGAGCGGTG AGACATTTTT TTTAATATAT 5880 

CAATTTCACT CCCAAGGGTT. TTGCCACCAG TCACAGCTTT CACAATGACT TTTCTCTTGG 5940 

TATTATCCCC ACG CTTTGTA CAAAC ATAG A TATACCCTTC AGATCCGGGC GGTAACGATG 6000 

AAACA^TGTT ATACTGCATT ' CGCACAGCTG AAACTGCATC AATGTCCGTC ACCGTTTCGG 6060 

GGGATTCATT TCCATCGTTC CCAAAAGGTG ACAAATCTTC AGATTGGGAC TCGGGAGAGC 6120 

TCTCAGTCTT CACATCGTCC ACATCCCCCG TCACTTCCGC GTTATCGGTA CTGTCGGGAA 6180 

AAGTATCAAA CAACGTCGTA TCCGTACCAT TGATGCTGTT ATGTATAATT CCGTAGGTAG 6240 

TATTAGTTTT AGAGTCGTGT ACTTTCGACG AAGAAATGCC ACATTCCATC GTTTCTGCCT 6300 

CCGGAGTCGA AGACATCCAG TCTATTACCT AGTTTTAACC CTGTTTCATA TTCTACCAGA 6360 

GTATAAGATT TGGAGATCAG ACCGGCCCAG TTATTAACAA TAAAAAAGAT TATTGGTGGA 6420 

GGTGAAG ATG GGT GTG TCC ATG ATA ACT ATA GTC ACA CTT CTA GAT GAA 6469 
Met Gly Val Ser Met He Thr He Val Thr Leu Leu Asp Glu 
1 5 10 

TGC GAT CGA TTG CCA GGA AG:* TCT AGA GAT GCT GCA TCT ACT TTA TGG 6517 
Cys Asp Arg Leu Pro Gly Ar ier Arg Asp Ala Ala Ser Thr Leu Trp 
15 20 25 30 

ATA TTC CTT ATA AAG CAA TGT ATG GAA CAA ATA CAG GAT GAT GTG GGT 6565 
He Phe Leu He Lys Gin Cys Met Glu Gin He Gin Asp Asp Val Gly 
35 40 45 

GTS CCC ATA ATC GCC AGA GCT GCA GAC CTA TTC CGT TTT GCC AAA CCC 6613 
Val Pro He He Ala Arg Ala Ala Asp Leu Phe Arg Phe Ala Lys Pro 
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50 55 60 

ATG TTA ATT CTT CCT CGG CAA CAT CGA CCG ATA GTA AGG ACA AAG CCA 6661 
Met Leu He Leu Pro Arg Gin His Arg Pro He Val Arg Thr Lys Pro 
65 70 75 

CCA GAT GGA ACT GGA GTT CGT GGT ACC GGA TTG GCC GGA ACT AGG GAT 6709 
Pro Asp Gly Thr Gly Val Arg Gly Thr Gly Leu Ala Gly Thr Arg Asp 
80 85 90 

TCG TTT ATA GTG CGG CTA TTT GAA GAT GTT GCA GGA TGT TCC ACA GAA 6757 
Ser. Phe He Val Arg Leu Phe Glu Asp Val Ala Gly Cys Ser Thr Glu 
95 100 105 HO 


TGG CAG GAT GTT CTA TCT GGA TAT TTG ATG TTG GAA TCT GAA GTT TCT 
Trp Gin Asp Val Leu Ser Gly Tyr Leu Met Leu Glu Ser Glu Val Ser 
115 120 125 


6805 


GGT AAT GCT CCA CAT AGC TTG TGG ATA GTT GGG GCG GCA GAT ATA TGT 6853 
Gly Asn Ala Pro His Ser Leu Trp He Val Gly Ala Ala Asp He Cys 
130 135 140 

GCC ATT GCG CTC GAA TGT ATT CCT TTG CCA AAA AGG TTA CTT GCA ATC 6901 
Ala lie Ala Leu Glu Cys He Pro Leu Pro Lys Arg Leu Leu Ala He 
* 145 150 155 

AAA 'GTG iCT GGG ACC TGG TCC GGT ATG CCG TGG GCC ATT CCC GAC AAT 6949 ' 
Lys Val Ser Gly Thr Trp Ser Gly Met Pro Trp Ala He Pro Asp Asn - 
IfcO"- 165 170 

ATT' CAA ACT CTC TTG ACA TCT ACA TGG GAA CCG AAG TTC GAC ACC CCA 6997 ' 

He Gin Thr Leu Leu Thr Ser Thr Trp Glu Pro Lys Phe Asp Thr Pro 
175 180 185 190 

GAA GAT AGA GCG CAT TTT TGC GAC AGT GAT ATG GTA TGT GTA TAC AAA 7045 
Glu Asp Arg Ala His Phe Cys Asp Ser Asp Met Val Cys Val Tyr Lys 
195 200 205 

ATC CTC GGG TCC CCA CCC AAT CCT CTA AAA CCT CCG GAA ATC GAA CCA 7093 
He Leu Gly Ser Pro Pro Asn Pro Leu Lys Pro Pro Glu He Glu Pro 
210 215 220 

CCT CAA ATG AGT AGT ACA CCC GGC AGA TTA TTC TGT TGT GGA AAA TGT 7141 
Pro Gin Met Ser Ser Thr Pro Gly Arg Leu Phe Cys Cys Gly Lys Cys 
225 230 235 

TGC AAG AAA GAA GAT AGA GAT GCG ATT GCA ATT CCG GTT CGT TAC ACT 7189 
Cys Lys Lys Glu Asp Arg Asp Ala He Ala He Pro Val Arg Tyr Thr 
240 245 250 

GCG ACA GGA AAG TCA CGA ATA CAG AAA AAA TGT AGA GCC GGT AGT CAT 7237 
Ala Thr Gly Lys Ser Arg He Gin Lys Lys Cys Arg Ala Gly Ser His 
255 260 265 270 

TAG CTGTTATTCG ACAGACCTAC TTGCTACCAA TTAGATATAA TTACATGATG 
End 
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GGGCGTATAC ACATTACGAT TAGGTGCATC GCTACAACCG TCGCTATAGT GTCACGTATA 7350 

ATTTGTATAT TAGTGCAATA ACAAACCCTT CTAGATCACT TATGTATCCA GGCTATCTTC 7410 

CATATACTTC TAACATCAGG AGAGATTCAA CAATCGAGCG CATTTGAAAG ACAACG ATG 7469 

Met 
1 


AGC AGA GTC AAT GCT ACA ATG TTC GAT GAT ATG GAT ATA CCA AGA GGA 7517 
Ser Arg Val Asn Ala Thr Met Phe Asp Asp Met Asp He Pro Arg Gly 
5 10 15 

CGA TTT GGT AAG CCA CCG AGA AAG ATT ACT AAT GTA AAT TTT TGG CAT 7565 
Arg Phe Gly Lys Pro Pro Arg Lys He Thr Asn Val Asn Phe Trp His 
20 25 30 

GTG GTT GTT GAT GAA TTC ACA GAA GGA ATC GTT CAA TGT ATG GAA GCC 7613 
Val Val Val Asp Glu Phe Thr Glu Gly He Val Gin Cys Met Glu Ala 
35 40 45 

CGA GAG AGA TTA GGC CTT TTA TGT ACC ATA TCT ACT AAC GAG GGA TCT 7661 
Arg Glu Arg Leu Gly Leu Leu Cys Thr He Ser Thr Asn Glu Gly Ser 
50 55 60 65 

ATT ACA TCG TTT GAT ATA CAC AAG GAT ATG TGG TGT CAA ATG GTT ATC 7709 
lie Thr Ser Phe Asp He His Lys Asp Met Trp Cys Gin Met Val He V 1 
70 75 80 . , 

TGG TCT GCC TAT AGA TTT TTT GCC ATG ATG GAC AAA ATG TTT TCG ATT 7757 
Trp Ser Ala Tyr Arg Phe Phe Ala Met Met Asp Lys Met Phe Ser He 
85 90 95 

GAA ACT ATC ACA AAT TTT ACA GAA ACT GAT CTT ACC GAA ACT GGT CAG 7805 
Glu Thr He Thr Asn Phe Thr Glu Thr Asp Leu Thr Glu Thr Gly Gin 
100 105 HO 

TGG AGA ATA TTC TAT AGA ACT TGG GAT GTG AGA GAT GCA TTG AAG ATG 7853 
Trp Arg He Phe Tyr Arg Thr Trp Asp Val Arg Asp Ala Leu Lys Met 
115 120 125 

AAA CAG GTG GGA CCA TTT TTG CCC GCA TTG TTT TCA TTT CAT CTG GAA 7901 
Lys Gin Val Gly Pro Phe Leu Pro Ala Leu Phe Ser Phe His Leu Glu 
130 135 140 145 

AAC TGG ACC ACA ATG CTT TCC ATA GGA ATC AAC AAG GGT TAT GAT CGA 7949 
Asn Trp Thr Thr Met Leu Ser He Gly He Asn Lys Gly Tyr Asp Arg 
150 155 160 

CAC AAT ACA CGA AAT ATG TTC ATG ACA ATA CAG TCT GCA AGA AAT GTC 7997 
His Asn Thr Arg Asn Met Phe Met Thr He Gin Ser Ala Arg Asn Val 
165 170 175 

CTT AGC GGG GCA ATA GAG GTA GCT CGA TAT GCC GTG GTT CTT GCT CTA 8045 
Leu Ser Gly Ala He Glu Val Ala Arg Tyr Ala Val Val Leu Ala Leu 
180 185 190 
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CCT GTG TGC GAG TAT AGA ACA CCC TTA GGC CTG CCG GAT GAT AGC ATA 8093 
Pro Val Cys Glu Tyr Arg Thr Pro Leu Gly Leu Pro Asp Asp Ser lie 
195 200 205 

GGA AAT GCC ATC AAG ACA TGC TGC ACG CAA ATG CAA GCG AAT CGA TTG 8141 
Gly Asn Ala lie Lys Thr Cys Cys Thr Gin Met Gin Ala Asn Arg Leu 
210 215 220 225 

ACA GAA ACT GGA ATA TCC AAG GAC AGT GGA CAT AAA ATA AAT GAT TCT 8189 
Thr Glu Thr Gly lie Ser Lys Asp Ser Gly His Lys lie Asn Asp Ser 
230 235 240 

TCT GAA GAG GAG TTG TAT TAT AGA ACC ATA CAT GAT CTT ATC AAA CCT 8237 ' 

Ser Glu Glu Glu Leu Tyr Tyr Arg Thr lie His Asp Leu lie Lys Pro 
245 250 255 

AAC CGG GAA CAT TGC ATA TCA TGC AAT ATT GAG AAT AGC ATG GAT ATA 8285 
Asn Arg Glu His Cys lie Ser Cys Asn lie Glu Asn Ser Met Asp lie 
260 265 270 

GAT CCC ACT ATT CAC CAT CGA TCT TCT AAT GTC ATA ACT TTA CAA GGT 8333 
Asp Pro Thr lie His His Arg Ser Ser Asn Val lie Thr Leu Gin Gly 
275 280 285 

ACA TCA ACA TAT CCA TTT GGA CGC AGG CCG ATG AGT CGA ATG GAT GTT 8381 
Thr Ser Thr Tyr Pro Phe Gly Arg Arg Pro Met Ser Arg Met Asp Val 
290 295 300 305 

GGA GGT CTT ATG TAC CAG CAC CCC TAC ATT TGC CGC AAT CTC CAT TTA 8429 
Gly Gly Leu Met Tyr Gin His Pro Tyr lie Cys Arg Asn Leu His Leu 
310 315 320 

CGT CCG CCT CGA TCC AGA CTA ATG AAT AGT AAA ATC CTA CAG ACA TTT 8477 
Arg Pro Pro Arg Ser Arg Leu Met Asn Ser Lys lie Leu Gin Thr Phe 
325 330 335 

AGA CAA AGT TTC AAT CGA AGT AAT CCT CAT GCA TAC CCC ATA TAA 8522 
Arg Gin Ser Phe Asn Arg Ser Asn Pro His Ala Tyr Pro lie End 
340 345 350 

TACATACAAT CATGACAACA CTGTAATGCC TTATTGAAAA TAAAATTTTA TTATTTAAAC 8582 

AACGTTAGTA GCAGTTTTTC CTAAAATCCT ATTAATAATT GTG CG ATT AG TTATAAGTAG 8642 

GATTCCCCGT CTCCTGTTGG CGATTCCCGA AGATTTGTCA GATAATGTGC CAATTCAGCA 8702 

TCATCACCGA TTGCTGCATT CCCCTTAGTA GCGACGGCAC GACATAAAGG TTTCCAATAA 8762 

GACTCTATTT CGGGGAGTGG ACTTATTCCA CAGCCCGTTG CCGAACCTAC TATGTCCATA 8822 

AGACGGACAT TCTTCTCATA TAAGCGCGAA ACAGTACAGT ATCCAGCATG TCCAAGACAA 8882 

CACCAATACA TCATGATAGT AAACCGAGTG TCCATTTCTT CGTGTGTAAG AGGAGCACGT 8942 

TCAATACACC GTAAAGCCCG GCCTACAATT TTTCTCGTCG GGTCTGTCGG GTGGAATGGC 9002 


GAAGCAGAAA TATCATATTC GTTAAGCGTG ACAGTCATTC TGTCGAATAT CTCACCCCAC 9062 
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AAACGAGACG ATGGTGGTTT TCCAGCTTTC ATAGCAGCCT GGGAGATCGT AGCGGCGGTT 9122 

AATATGGTCC TGGCTAGACT GCGTACAGAT TTAGGCAATA GCGCAACATG TTCCCCGCCG 9182 

GCAGAAAGTA TATCATAACT CTGTTCTTTT GGAGAATCTA CCCGGAGTTG CACACTCCTG 9242 

CTAGATTTGC GCCGTAGAGA CCACATGGCC ATACCTCTCC AATATGTTTT AATCTTACAC 9302 

GGCGCTCGTC TCCAGTATTC AAACACGTTC TCCTCTCATT AGGCTCAACG CCACATTAAA 9362 

TATCTTCATA TACAACAAAA AGGCAACACG TTATTTGACA CGCCCCTTCA TGGATGGGGG 9422 

GGGTCAGCGT TTGTTGCAAC AGATCATGAC AAATAAATCC AAAATCTATT ATTTTATCTC 9482 ' 

ATTAGATAGA TCAAAGAATG TCGGCTCTAT GTCTAACAAT TAAAATTATA TAATAAGAGC 9542 

TTTCTCTTCA AGTCTGGATA GTTAATGCAA TTTACTGTCT ACCGACAAAT CGTTCATTCC 9602 

TTTTACATCG CAGTCTGAAG AAATAGTTCC CGAGGACGCA GCGATTGGGT GAAAAATGCT 9662 

ATCGGAGGCA TATATATCGG ATATAGGATG GGCGCTTCGA CTATCAGCAT CCCTCAGAGT 9722 

CCTGCGCAGA TGTAGACTTT GGCGTGGGGT CAAATTCATG ATAGTTTCCC ATTCGGCTTG 9782 

TTTTAGTCGA TATCCCATTC GACCAATCAT ATGAATATCG AATAGTGCTC TCCGAAGAGC * 342 

ATCGTGGAAC GGACCGCTAT TTAGTCGACA TCGAATAAAA CATCGAAATA GTTTGTTTGT 9902 

ATCCGCACAT AACCGAGCGA CATCGGGTTT CCATGGTAGA GGACAAAATT TGCCCACATT 9962 

ATl/iAGTTCA AAGTCTTGAT CGGACGAGTC ACTGCCATAT TCCGGATGTG AATGTGGCAG 10022 

TTGATAATCT TCGTCGTCGC TCTCATTATC TGACGATGAT AATCGTGTAT CGGGTCTGGC 10082 

TCGATCTCGA TCACGACTCA TGTTGCCTCC GATGGAGCCG AAAGCAGGTT TTCTGCTCAA 10142 

GTGTAATTTG GAGACTTTGG CCTGTATTAT ATAGCTACCA GCTTTTATCT TCTGCTAGGA 10202 

ACAATAATTG CTAGAATTTA CATCACGTGA TATCCGGTCA AAAATTACTT GGTCTTTAAC 10262 

CCAGCCCCTA ATGTACTACT TGCTCTATAT ATTCTCCACA ATGGTAAACC TCCCTCCCTA 10322 

AAGATTTCAC TCCAATTTCA AGGAATTC 10350 
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(4) Information for SEQ ID NO: 3 

(i) Sequence Charter is tics : 

(A) Length: 497 amino acids 

(B) Type: peptide 

( C ) S tr andednes s : s ingle 

(D) Topology: linear 

(ii) Molecule Type: 

(A) Description: polypeptide 

(vi) ORIGINAL SOURCE: 

(A) Organism: MDV, GA strain 

(vii) IMMEDIATE SOURCE: 

(A): Library: genomic 


PCT/US91/05870 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Cys Val Phe Gin He Leu He He Val Thr Thr He Lys Val Ala 
-15 -10 - 5 

Gly Thr Ala Asn He Asn His He Asp Val Pro Ala Gly His Ser Ala 
1 5 10 

Thr Thr Thr He Pro Arg Tyr Pro Pro Val Val Asp Gly Thr Leu Tyr 
15 20 25 30 

Thr Glu Thr Trp Thr Trp He Pro Asn His Cys Asn Glu Thr Ala Thr 
35 40 45 

Gly Tyr Val Cys Leu Glu Ser Ala His Cys Phe Thr Asp Leu He Leu 
50 55 60 

Glv Val Ser Cys Met Arg Tyr Ala Asp Glu He Val Leu Arg Thr Asp 
65 70 75 

Lvs Phe He Val Asp Ala Gly Ser He Lys Gin He Glu Ser Leu Ser 
80 85 90 

Leu Asn Gly Val Pro Asn He Phe Leu Ser Thr Lys Ala Ser Asn Lys 
95 100 105 HO 

Leu Glu He Leu Asn Ala Ser Leu Gin Asn Ala Gly He Tyr He Arg 
115 120 125 

- Tyr Ser Arg Asn Gly Thr Arg Thr Ala Lys Leu Asp Val Val Val Val 
130 135 1^0 

Gly Val Leu Gly Gin Ala Arg Asp Arg Leu Pro Gin Met Ser Ser Pro 
145 150 155 

Met He Ser Ser His Ala Asp He Lys Leu Ser Leu Lys Asn Phe Lys 
160 165 170 

Ala Leu Val Tyr His Val Gly Asp Thr He Asn Val Ser Thr Ala Val 
175 180 185 190 

He Leu Gly Pro Ser Pro Glu He Phe Thr Leu Glu Phe Arg Val Leu 
195 200 205 

Phe Leu Arg Tyr Asn Pro Thr Cys Lys Phe Val Thr He Tyr Glu Pro 
210 215 220 

Cys He Phe His Pro Lys Glu Pro Glu Cys He Thr Thr Ala Glu Gin 
225 230 235 

Ser Val Cys His Phe Ala Ser Asn He Asp He Leu Gin He Ala Ala 
. 240 245 250 

Ala Arg Ser Glu Asn Cys Ser Thr Gly Tyr Arg Arg Cys He Tyr Asp 
255 ' 260 265 270 

Thr Ala He Asp Glu Ser Val Gin Ala Arg Leu Thr Phe He Glu Pro 
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275 280 285 

Gly lie Pro Ser Phe Lys Met Lys Asp Val Gin Val Asp Asp Ala Gly 
290 295 300 

Leu Tyr Val Val Val Ala Leu Tyr Asn Gly Arg Pro Ser Ala Trp Thr 
305 310 315 

Tyr He Tyr Leu Ser Thr Val Glu Thr Tyr Leu Asn Val Tyr Glu Asn 
320 325 330 

Tyr His Lys Pro Gly Phe Gly Tyr Lys Ser Phe Leu Gin Asn Ser Ser 
335 340 345 350 

He Val Asp Glu Asn Glu Ala Ser Asp Trp Ser Ser Ser Ser He Lys 
355 360 365 

Arg Arg Asn Asn Gly Thr He He Tyr Asp He Leu Leu Thr Ser Leu 
370 375 380 

Ser He Gly Ala He He He Val He Val Gly Gly Val Cys He Ala 
385 390 395 

He Leu He Arg Arg Arg Arg Arg Arg Arg Thr Arg Gly Leu Phe Asp 
400 405 410 

Glu Tyr Pro Lys Tyr Met Thr Leu Pro Gly Asn Asp Leu Gly Gly Met 
415 420 425 ' 430 

Asn Val Pro Tyr Asp Asn Thr Cys Ser Gly Asn Gin Val Glu Tyr Tyr 
435 440 445 

Gin Glu Lys Ser Ala Lys Met Lys Arg Met Gly Ser Gly Tyr Thr Ala 
450 455 460 

Trp Leu Lys Asn Asp Met Pro Lys He Arg Lys Arg Leu Asp Leu Tvr 
465 470 475 

His End 
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WE CLAIM: 

-1- 

A 2.53 Kb segment of DNA with a gene coding MDV 
glycoprotein E (gE) precursor, between a 8488 and 9978 bp 
sequence of Marek's disease herpesvirus DNA and identified 
as part of SEQ ID No:l, and containing potential promoter 
5 sequences up to 400 nucleotides 5' of each gene and 

subfragments of the DNA which selectively recognize the DNA 
when in the form of a probe. 

-2- 

The segment of Claim 1 wherein the glycoprotein 
encoded contains 497 amino acids. 

-3- 

The substantially pure glycoprotein gE precursor 
which comprises: 

Met Cys Val Phe Gin He Leu He He Val Thr Thr He Lys Val Ala 
-15 -10 . 5 

5 Gly Thr Ala Asn He Asn His He Asp Val Pro Ala Gly His Ser Ala 

1 5 io 

Thr Thr Thr He Pro Arg Tyr Pro Pro Val Val Asp Gly Thr Leu Tyr 
15 20 25 30 

Thr Glu Thr Trp Thr Trp He Pro Asn His Cys Asn Glu Thr Ala Thr 
10 35 40 45 

Gly Tyr Val Cys Leu Glu Ser Ala His Cys Phe Thr Asp Leu He Leu 
50 55 eo 

Gly Val Ser Cys Met Arg Tyr Ala Asp Glu He Val Leu Arg Thr Asd 
65 70 75 

l-\ Lys Phe He Val Asp Ala Gly Ser He Lys Gin He Glu Ser Leu Ser 

80 85 90 

Leu Asn Gly Val Pro Asn He Phe Leu Ser Thr Lys Ala Ser Asn Lvs 
95 100 105 uo 

Leu Glu He Leu Asn Ala Ser Leu Gin Asn Ala Gly He Tyr He Arg 
20 1-15 120 125 
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Tyr Ser Arg Asn Gly Thr Arg Thr Ala Lys Leu Asp Val Val Val Val 

135 


25 


45 


55 


140 

Gly Val Leu Gly Gin Ala Arg Asp Arg Leu Pro Gin Met Ser Ser Pro 

150 155 
Met lie Ser Ser His Ala Asp lie Lys Leu Ser Leu Lys Asn Phe.Lys 


170 


Ala Leu Val Tyr His Val Gly Asp Thr He Asn Val Ser Thr Ala Val 

185 190 
He Leu Gly Pro Ser Pro Glu lie Phe Thr Leu Glu Phe Arg Val Leu 

193 200 205 

Phe Leu Arg Tyr Asn Pro Thr Cys Lys Phe Val Thr lie Tyr Glu Pro 

215 220 

Cys lie Phe His Pro Lys Glu Pro Glu Cys lie Thr Thr Ala Glu Gin 

230 235 
Ser Val Cys His Phe s As* m Asp IU u. Gln Iu A1 . AU 

245 250 
Ala Arg Ser Glu Asn Cys Ser Thr Gly Tyr Arg Arg Cys lie Tyr Asp > 

265 270 

4q Thr Ala lie Asp Glu Ser Val Gin Ala Arg Leu Thr Phe He Glu Pro 

Gly lie Pro Ser Phe Lys Met Lys Asp Val Gin Val Asp Asp Ala Gly 
^° 295 300 

Leu Tyr Val Val Val Ala Leu Tyr Asn Gly Arg Pro Ser Ala Trp Thr 


280 


285 


315 

Tyr lie Tyr Leu Ser Thr Val Glu Thr Tyr Leu Asn Val Tyr Glu Asn 

325 330 

Tyr His Lys Pro Gly Phe Gly Tyr Lys Ser Phe Leu Gin Asn Ser Ser 

340 345 3 50 

lie Val Asp Glu Asn Glu Ala Ser Asp Trp Ser Ser Ser Ser He Lys 

360 365 

Arg Arg Asn Asn Gly Thr lie lie Tyr Asp He Leu Leu Thr Ser Leu 

370 3 75 380 

Ser lie Gly Ala lie He lie Val He Val Gly Gly Val Cys He Ala 


He Leu He Arg Arg Arg Arg Arg Arg Arg Thr Arg Gly Leu Phe Asp 


395 

410 
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Glu Tyr Pro Lys Tyr Met Thr Leu Pro Gly Asn Asp Leu Gly Gly Met 
415 420 425 430 

Asn Val Pro Tyr Asp Asn Thr Cys Ser Gly Asn Gin Val Glu Tyr Tyr 
60 435 440 445 

Gin Glu Lys Ser Ala Lys Met Lys Arg Met Gly Ser Gly Tyr Thr Ala 
450 455 460 

Trp Leu Lys Asn Asp Met Pro Lys lie Arg Lys Arg Leu Asp Leu Tyr 
465 470 475 

65 His End 
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-4- 

A method for reducing pathogenicity or 
virulence of a Marek's disease herpesvirus whereby a gene 
which encodes for a glycoprotein selected from 
glycoproteins I and E is altered . 

-5- 

In a method for producing a virus vector 
vaccine, for use iji vivo , or an in vitro expression vector, 
for a protein which produces antibodies against Marek's 
disease by providing in the vaccine or vector a segment of 

5 DNA from the genome of a Marek's disease herpesvirus that 
encodes a protein producing an antibody response in birds, 
the improvement which comprises: 

inserting into the vaccine or vector a segment 
of DNA containing all or part of a gene encoding a 

.0 glycoprotein selected from the group consisting of gD, gl 
and gE of the Marek's disease herpesvirus, 

-6- 

A method for providing foreign genes in a 
Marek's disease herpesvirus which comprises inserting the 
foreign gene into a region of DNA of the herpesvirus which 
encodes a non-essential protein selected from the group 
5 consisting of gl and gE. 
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-7- 

A segment of DNA of Marek's disease herpesvirus 
(MDV) genome with genes encoding multiple glycoproteins 
between a 1 and 8799 nucleotide sequence and identified as 
SEQ ID NO:l f and optionally containing regulatory sequences 
up to 400 nucleotides 5' of each gene as shown in Figure 2 
and subsegments of the DNA which selectively recognize 
portions of the segment of the DNA when in the form of a 
probe. 

-8- 

An EcoRl I segment of Marek's disease 
herpesvirus genome encoding MDV glycoprotein D (gD) 
precursor and regulatory sequences and subsegments of the 
DNA which selectively recognize the DNA when in: the form of 
a probe. 

-9- 

A 1.75 kbp Ncol-Sstll segment of DNA of Marek's 
disease herpesvirus with a gene encoding MDV gD and 
containing a 5' regulatory region with the gene and 
subsegments of the DNA which selectively recognize the DNA 
when in the form of a probe. 

-10- 

A segment of DNA encoding MDV gD precursor, 
between a 596 4 and 7175 bp nucleotide sequence of Marek's 
disease herpesvirus and identified as part of SEQ ID No:l f 
and optionally containing a 5* regulatory region of up to 
400 bp in length as shown in Figure 2 and si., segments of 
the segment of DNA which selectively recognize the DNA when 
the form of a probe. 


3 
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-11- 

The segment of Claim 10 wherein the glycoprotein 
encoded contains 403 amino acids, 

-12- 

A segment of DNA with a gene encoding a 
glycoprotein I (gl) precursor between a 7282 and 8349 bp 
DNA sequence of Marek's disease herpesvirus and identified 
as part of SEQ ID No:l, and optionally containing a 5" 
regulatory region with the gene of up to 400 bp in length 
as shown in Figure 2 and subfragments of the segment of DNA 
which selectively recognize the DNA when in the form of a 
probe. 

-13- 

The segment of Claim lZwherein the glycoprotein 
encoded contains 355 amino acids, 

-l*f- 

A segment of DNA with a gene encoding a part of 
MDV glycoprotein E (gE) precursor, between a 8488 and 8799 
bp DNA sequence of Marek's disease herpesvirus and 
identified as part of SEQ ID No:l, and optionally 
containing a 5' regulatory region with the gene of up to 
400 bp in length, as shown in Figure 2 and subfragments of 
the DNA which selectively recognize the DNA when in the 
form of a probe . 

-15- 

The segment of Claim l*f wherein the glycoprotein 
encoded contains 104 amino acids. 
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-16" 

The substantially pure glycoprotein gl precursor 
which comprises: 


Met Tyr Val Leu Gin Leu Leu Phe Trp lie Arg Leu Phe Arg Gly lie 
-15 -10 -5 

5 Trp Ser lie Val Tyr Thr Gly Thr Ser Val Thr Leu Ser Thr Asp Gin 

15 10 
Ser Ala Leu Val Ala Phe Arg Gly Leu Asp Lys Met Val Asn Val Arg 
15 20 25 30 

Gly Gin Leu Leu Phe Leu Gly Asp Gin Thr Arg Thr Ser Ser Tyr Thr 
10 35 40 45 

Gly Thr Thr Glu lie Leu Lys Trp Asp Glu Glu Tyr Lys Cys Tyr Ser 

50 55 60 

Val Leu His Ala Thr Ser Tyr Met Asp Cys Pro Ala He Asp Ala Thr 
65 70 75 

15 Val Phe Arg Gly Cys Arg Asp Ala Val Val Tyr Ala Gin Pro His Gly 

80 85 90 

Arg Val Gin Pro Phe Pro Glu Lys Gly Thr Leu Leu Arg He Val Glu 
95 100 105 no 

Pro Arg Val Ser Asp Thr Gly Ser Tyr Tyr He Arg Val Ser Leu Ala 
20 115 120 125 

Gly Arg Asn Met Ser Asp He Phe Arg Met Val Val He He Arg Ser 

130 135 140 

Ser Lys Ser Trp Ala Cys Asn His Ser Ala Ser Ser Phe Gin Ala His 
145 150 155 

25 Lys Cys He Arg Tyr Val Asp Arg Met Ala Phe Glu Asn Tyr Leu He 

160 165 170 

Gly His Val Gly Asn Leu Leu Asp Ser Asp Ser Glu Leu His Ala He 
175 180 185 190 

Tyr Asn He Thr Pro Gin Ser He Ser Thr Asp He Asn He Val Thr 
30 195 200 205 

Thr Pro Phe Tyr Asp Asn Ser Gly Thr He Tyr Ser Pro Thr Val Phe 

210 215 220 

Asn Leu Phe Asn Asn Asn Ser His Val Asp Ala Met Asn Ser Thr Gly 
225 230 235 

35 Met Trp Asn Thr Val Leu Lys Tyr Thr Leu Pro Arg Leu He Tyr Phe 

240 245 250 

Ser Thr Met He Val Leu Cys He He Ala Leu Ala He Tyr Leu Val 
255 260 265 270 

Cys Glu Arg Cys Arg Ser Pro His Arg Arg He Tyr He Gly Glu Pro 

275 280 285 

Arg Ser Asp Glu Ala Pro Leu He Thr Ser Ala Val Asn Glu Ser Phe 

' 290 295 300 

Gin Tyr Asp Tyr Asn Val Lys Glu Thr Pro Ser Asp Val He Glu Lys 

305 310 315 

Glu Leu Met Glu Lys Leu Lys Lys Lys Val Glu Leu Leu Glu Arg Glu 

320 325 330 

Glu Cys Val 
335 


40 


45 
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-17- 

The substantially pure glycoprotein gD partial 
precursor which comprises: 

Met Aan Arg Tyr Arg lyr Glu Ser lie Phe Phe 
" 30 -25 -20 

5 Thr Arg Met He Leu He lie Cya Leu Leu Leu 

-10 -5 
Ser Ala Met Gly Leu Lya Lys Asp Asn Ser Pro 

5 io 
Hia Pro Lya Gly Asn Glu Aan Leu Arg Ala Thr 
10 20 25 

He Pro Ser Pro Leu Phe Asp Thr Leu Asp Aan 
35 4 o 45 

His Val He Tyr Thr Asp Asn Cys Ser Phe Ala 
55 60 
15 Gly Aap Pro Lys Tyr Thr Leu Leu Ser Leu Leu 

70 75 
Lya Tyr Aap Ala Leu Val Ala Trp Phe Val Leu 

85 go 
Arg Pro He Tyr Leu Arg Glu Tyr Ala Asn Cya 
20 100 105 

Phe Gly Thr Cys Lys Leu Lys Ser Leu Gly Trp 
115 120 125 

Ala Met Thr Ser Tyr He Aap Arg Asp Glu Leu 

135 140 
25 Ala Pro Ser Arg Glu Leu Ser Gly Leu Tyr Thr 

ISO 155 
Asn Gly Glu Pro He Ser Ser Asp He Leu Leu 

165 170 
Cys Ser Phe Ser Arg Arg Gly He Lya Asp Aan 
30 180 las 

Phe Ser Phe Phe Val Asn Gly Thr Thr Arg Leu 
195 200 205 

Thr Gly Thr Pro Arg Ala Hia Glu Glu Asn Val 
, 215 220 

^9 A3n G ly Gly Lys Hia Leu Pro He Val Val 

230 235 
Gin Val Ser Asn Leu Pro Arg Ser Phe Arg Aap 

245 250 
Pro Asp Aap Asp Lys Tyr Asn Asp Val Lys Met 
40 260 265 

Asn Asn He Thr Thr Ser Val Asp Gly Tyr Thr 
275 280 285 

Pro Glu Asp Phe Glu Lys Ala Pro Tyr He Thr 

2 *5 300 
Ser Val Glu Glu Ala Ser Ser Gin Ser Pro Lys 

310 3i5 
Lys Ser Arg Thr Gin He He He Ser Leu Val 

325 330 
Phe Cys Phe He Val He Gly Ser Gly He Tro 

340 345 
Arg Lya Thr Val Met Tyr Asp Arg Arg Arg Pro 
355 3 6 o 365 

Ser Arg Leu 


45 


50 


Ar-rr 

Tyr 

lie 

Ser 

Ser 





-15 

Gl v 

ixir 

Gly 

Asp' 

Met 




T 

X 


lie 


*rro 

xxir 

Leu 






Leu 

Asn 

tflu 

Tyr 

Lys 

30 





Car 

Tyr 

Pi,, 

tTxir 

Lys 





50 

Val 

Leu. 

Asn 

Pro 

Phe 




DO 



Met 

Gly 

Arg 

Arg 



on 
ou 




Arg 

ax a 

Cys 

Gly 







inr 

Asn 

Glu 

Pro 






Trp 

Asp 

Arg 

Arg 

Tyr 





130 

Lys 

Leu 

He 

He 

Ala 




145 


Arg 

Leu 

He 

He 

He 



-LOU 



Thr 

val 

Lys 

Gly 

Thr 


17c 
-L. / 3 





Leu 

Cys 

Lys 

Pro 






T fen 

Asp 

Met 

Val 

Arg 





210 

L#y s 

uin 

Trp 

Leu 

Glu 







Thr 

Ser 

Met 

Gin 






ber 

Tyr 

Leu 

Lys 

Ser 


255 




Thr 

Ser 

Ala 

Thr 

Thr 

270 





Gly 

Leu 

Thr 

Asn 

Arg 





290 

Lys 

Arg 

Pro 

He 

He 




305 


lie 

Ser 

Thr 

Glu 

Lys 



320 



Val 

Leu 

Cys 

Val 

Met 


335 




He 

Leu 

Arg 

Lys 

His 

350 





Ser 

Arg 

Arg 

Ala 

Tyr 





370 
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-18- 

The substantially pure glycoprotein gE precursor 
which comprises: 


5 


15 


Met 

Cys 

Val 

Phe Gin He 

Leu 

He 

He 

Val 

Thr Thr He Lys Val Ala 




-15 



-10 


-5 

Gly 

Thr 

Ala 

Asn He Asn 

His 

He 

Asp 

Val 

Pro Arg Gly His Ser Ala 



1 


5 




10 

Thr 

Thr 

Thr 

He Pro Arg 

Tyr 

Pro 

Pro 

Val Val Asp Gly Thr Leu Tyr 

IS 



20 





25 30 

Thr 

Glu 

Thr 

Trp Thr Trp 

He 

Pro 

Asn 

His 

Cys Asn Glu Thr Ala Thr 




35 




40 

45 

Gly 

Tyr 

Val 

Cya Leu Glu 

Ser 

Ala 

His 

Cys 

Phe Thr Asp Leu He Leu 




50 



55 


60 

Gly 

Val 

Ser 

Cys Met Arg 

Tyr 

Ala 

Asp 

Glu 

He Val Leu Arg Thr Asp 



65 



70 



75 

Lys 

Phe 

lie Val Asp Ala 

Gly 

Ser 





80 



85 






-19- 

A 5' regulatory region for glycoprotein gD 
between 5664 and 59 63 nucleotide sequence as shown in 
Figure 2. 


-20" 

A 5 1 regulatory region for glycoprotein gE 
between 8088 and 8487 nucleotide sequence as shown in 
Figure 2. 


-21- 

A regulatory region for glycoprotein gi between 
6882 and 7281 nucleotide sequence as shown in Figure 2. 
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Fig. 2 


1 CAATTCCTlGAAATTGCACTCAAATCTTlACCGAGCCACCTTTACCAnGTCGAGAATATAT^AGCAAGTAGTACATTACCGGCTGCGT7AAACACCAA 

101 GTAATUTTGACCGGATATCACGTGA TGTAAAT TCTAGCAATTATTGTTCCTACCAGAA GATAAAA GCTGGTAGC TATATAATA CAGGCCAAAGKTCCA 

US1 7 HSRDRDRARPOTRLSSSD 

201 AATTACACTTGAGCAGAAAACCTCCTTTCGGCTCCA7CGGAGGCAACA7GAGTCGTGATCGACATCGAGCCAGACCCGATACACGATTATCATCGTCAGA 

19 HES00EDYOLPHSHPEYGSDSSOO0FELHHVGK 

301 TAATGAGAGCGACGACGAAGATTATCAACTGCCACATTCACATCCGGAATATGGCAGTGACTCGTCCGATCAAGACTTTGAACT7AATAATGTGGGCAAA 

53FCPLPUKPOVARLCAOTMKLFRCFIRCRLHSGPF 

401 TTTTGTCCTC7ACCATGGAAACCCGATGTCGCTCCG7TATGTGCGGA1ACAAACAAACTATTTCGATGTTTTATTCGATGTCGAC7AAATAGCGGTCCGT. 

86 HDALRRAL FD1H HI GRKGYRLKOAEUET I H M t T 

501 7CCACGATGCTCT7CGGACAGCACTATTCGATA7TCATATGATTCGTCGAATGGGATA7CGACTAAAACAAGCCGAATGGGAAACTATCATGAATT1GAC 

119 PRQSIHIRRTIRDADSRSAHPI SO 1 YA50SI F H 

601 CCCACGCCAAAGTCTACATCTGCGCAGGACTCTGAGGGATCCTGATACTCGAAGCGCCCATCCTATATCCCATATATATGCCTCCCATAGCATTT1TCAC 

153 P IAASSGT I SSDCDVJCGHNDLSV0SKLH*179 

701 CCAA7CGCTGCGTCCTCGGGAACTATTTCTTCAGACTGCGATGTAAAAGGAATGAACGATTTGTCGGTAGACAGTAAATTGCATTAAC7ATCCAGAC7TG 

801 AAGAGAAAGCTCTTAT TATA7AATT TTAATTCTTAGACATAGAGCCGACATTCTTTGATC TATCTAA TGAGATAAAATAATAGATTTTGGATTTATTTCT 

901 CATGATCTCTTGCAACAAACGC1GACCCCCCCCATCCATGAAGGGGCCTGTCAAATAACGTGTTGCCTTTTTCT TCTATATGA AGA TATTTAA 7GTGGCG 

US10 1 KAKUSLRR 

1001 TTGAGCCTAATGAGAGGAGAACGTGTTTGAATACTGGAGACGAGCGCCGTGTAAGATTAAAACATATTGGAGAGGTATGGCCATGTGGTC7CTACGGCGC 

9KSSRSVOIRV0SPKE0SYD 1 tSAGGEHVALLPKS 

1101 AAATCTAGCAGGAGTGTCCAACTCCGGGTAGATTCTCCAAAACAACAGAGTTATGATATACTTTCTCCCGGCGGGGAACATGTTGCGC7ATTGCCTAAA1 

A3 VRSLARTI LTAAT ISOAAHKAGICPPSSRLUGEI 

1201 CTG7ACGCAGTCTAGCCAGGACCA7ATTAACCGCCGCTACGATCTCCCAGGCTCCTATGAAAGCTGGAAAACCACCATCGTCTCG1TTG7GCCGTCAGA7 

76 FDRMTVTLMErDISASPFHPTOPTRKIVGRALR 

1301 AT1CGACAGAA7GACTGTCACGCTTAACGAATA7GATA7TTCTGC7TCGCCATTCCACCCGACAGACCCGACGAGAAAAATTG7AGGCCGGGCT7TACGG 

109 C I E RAPL T KEEHOTR FT I'KHYUCCLGHAGYCTVS 

HOI 7GTATTGAACGTGCTCC7CTTACACACGAAGAAATGCACACTCGGTTTACTATCATGATGTATTGGTGTTGTCTTGGACATGCTGGATACTG7ACTCTTT 

H3 R L Y E KNVR L HD 1 VGSATGCG 1 SPLPE IESYUCP 

1501 CGCGCTTA7ATGAGAAGAA7G7CCGTCT7ATGGACATAG7AGGT7CGGCAACGGGCTGTGGAATAAGTCCACTCCCCGAAATAGAGTCT7A7TGGAAACC 

176 LCRAVATKGNAAIGDDAEL AHYLTWIRESPIGO 

1601 TTTA7GTCGTGCCGTCGCTACTAACGCGAA7GCAGCAATCGCTGATGATGCTGAATTGGCACATTATCTGACAAATCTTCGGGAATCGCCAACAGGAGAC 

209 G E S Y L * 213 

1701 GGGGAATCC7ACTTATAACTAA7CGCACAATTA77AATAGGATTTTAGGAAAAACTGCTACTAACGTTGTT1AAATAATAAAATTT7ATTT TCAA7AAGG 

1801 CAT1ACAG7GTTGTCATGATTGTATGTATTATATGGCGTATGCA7GAGGATTACTTCGATTGAAACTTTG7CTAAATGTCTCTAGGAT771ACTATTCAT 

351 •IPYAHPHSRNFSORFTOLIKSMK 

1901 7AGTCTGGATCGAGGCGGACGTAAATGGAGATTGCGGCAAATGTAGGGGTGCTGG7ACATAAGACCTCCAACATCCATTCGACTCATCGGCC7GCG1CCA 

328 LRSRPPRLHLHRCIYPHQYHLGGVOKRSKPRRG 

2001 AATGGATAIGTTGATGTACCT7G1AAAGT7A1GACATTAGAAGATCGATGGTGAATAGTGGCATC1ATATCCATGCTATTCTCAATATTGCA1GATATGC 

295 FPYTS1GOLTIVMSSRHH! TPDIDHSKElKCSl 

2101 AATGnCCCGGTTAGGT71GAlAAGAlCATGTATGGTTCTATAAtACAAC7CC7CTTCAGAAGAATCATTTATT7TA7GTCCACTGTCC17GGA7A77CC 

262 CHERHPKllDHITRYYt.EEESSDMIKHGSDtrSlG 

2201 AGT7TCTGTCAA7CGA7TCGCTTGCATTTGCG7GCAGCATGTCT7CATGGCATTTCCTATGCTATCA7CCGGCAGGCCTAAGGGTGTTCTATACTCGCAC 

228 7E1 LRHAOKOTCCTlCIAWG 1 SDDPIGLPTRYEC 

2301 ACACGTAGAGCAAGAACCACGGCATATCGAGCTACCTCTAT7GCCCCGCTAAGGACATTTCTTGCAGACTGTAT7GTCATGAACATA7TTCCTGTATTGT 

195 VPLALVVAYRAVEIAGSUVHRASOI THFKHRTK 

2401 G7CCATCA7AACCCTTGTTCA77CCTA7GCAAAGCA77GTGGTCCAGTT7TCCAGATGAAATGAAAACAATGCGGGCAAAAATGCTCCCACCTGTTTCAT 
162HR0YG«N!GISLMTTUMELMFSFLAPLFPGV0KH 

2501 CTTCAATGCATCTC7CACATCCCAAGTTCTA7AGAATATTCTCCACTGACCAGTTTCGGTAAGATCAGTTTCTGTAAAA7TTGTCATAGTTTCAATCGAA 

128 tCLADRVDUTRYFI RUOGTET LDTETFNT J TEIS 

2601 AACATTT7GTCCATCATCGCAAAAAATCTATAGGCACACCAGA7AACCA7T7GACACCACATATCCTTG7GtATATCAAACGATGTAATAGATCCCTCGT 

95 FKKOHMAFFRYASUIVKQCWMOKH IDF ST I SGE 

2701 TAGTAGA7ATGGTACATAAAAGGCCTAATCTCTCTCGGGCTTCCATACATTGAACGATTCCTTCTGTGAATTCATCAACAACCACATGCCAAAAATTTAC 
62MTSI TCLLGLRERAEKCOV1 GE T FEOVVVHUFNV 

e . nr4 2801 ATTAGTAATCTTTCTCGGTGGCTTACCAAATCGTCCTCTTGGTATATCCATArCATCGAACATTGTAGCATTGACTCTCCTCATCGTTGTCTTTCAAATG 

SORF1 28 NTI CRPPKGFRGRP IOHDD FHTANVRSK 


WO 92/03547 


3/8 


PCT/US91/05870 


Fig. 2 Continued 



3201 TTTCTICCWCATITTCWCAACACMTMtCTCCCCCSTCTACTACTCATITGWGIGCTTCWTTTCCCGAGCTTTTACAGCATTGCCtCCCCACCCG 
241 KKCCKGCCFtRGPTSSHOPPEIEPPKtPWPPSG 

3301 AGGATTTTGTATACACATACCATATCACTGTCGCAAAMIGCGCTCTATCTTCTGGGGTGTCGAACTTCGGTTCCCATGTAGMGICAAGAGAG^ 
208 LIKtVCVKOSOCFMAROEPTOFKPEUTS'1-i.iw 

3401 TATTGTCGGGAAtGGCCCACGGCATACCGGACCAGGTCCCAGACACTTIGATTGC 

,50, TATATCTGCCGCCCCAACTATCCACAAGCTATGTGGAGCATTACCAGMACTTCAGA^ 

t D A A G V I Wl5 
360! GAACATCCTGCAAWTCTTCAAATAGCCGCACTATAAACGAATCCCTAGTTCCGGW 

3701 TTAC1ATCGGTCGATGTTGCCGAGGAAGAATIAACATGGGTTTGGCAAMCGGAATAGGTCIGCAGCTCTGGCGATTATGGK 

75 R V 1 P R H 0 R P I I LMPKAF R f LOA*ka 

3801 T ATT T G T T CCAT ACM T G CTTTATAACCAAT AT CCAT AAAGTAGAT GCAGCAT CT CT AGAT CT TCCT GGCAAT CGAT CGC AT ^C** £T AG AAGTGT G ACT 
41 joEHCQlCltFlWLTSAAORSRGPLROCEDLLTV 

3901 ATAGTTATCATGGACACACCOkTCTTCACCTCCACCAATAATCTTTTTTAJ^GTTAATAACTGGGCCGGTCTG 
US2 8 1 T I H S V G K MECGISSSKVHOS 

(p K ) 4001 TATGAAA CAGGGTTAAAACTAGGTAATAGACTGGATGTCTTCGACTCCGGAGGCAGAAACGATGGAATGTGGCATTTCTTCGTCGAAAGTACACGACTCT 

1/ r T U T V YCllHMSINGTDTTLFOTFPOSTOMAEVT 
4lil MAACTAATACTACCTACCGAATJATA^ACAGCATCAATGGTACGGATA 

ifl r n V D O VKTESSPESOSEOLSPFGMOGMESPETV 
420? CGGGGGATGTGGACGATGTGAAGACTGACAGCTCTCCCGAGTCCCAATCTGAAGATTTGTCACCTTTTGGGAACGATGGAAATGAATCCCCC 

«1 t n l O A V SAVRHQTNIVSSLPPGSEGY I YVCTKR 
«01 GACCGACATTGATGCAGTTTCAGCTGTGCGAATKAGTATAACATTGTTTCATCGTTACCGCCCGGATCTGAAGGGTATATCTATGTTTGTACA 

«zrn--«TERKVIVKAVT6CKTLCSE 1 O ! UKKKSHRS I 
4401 CGGGATAATACCAAGAGAAAAGTCATTGTGAAAGCTGTGACTGGTGGCAAAACCCTTGGGAGTGAAATTC 

,/a I plVHAYRUCSTVCKVMPKYICCDL FTY 10 IHGP 
450? TAATT AGAT TAGTTCATGCT TAT AGATGGAAATCGACAGTTTGTATGGTAATGCCTAAATACAAATGCGACTTGTTTACGTACATAGATA 

, p , u o I ITIERGLLGALAYIHEICGI IHROVKTE 
till ATTGCCACTAAATCAAATAATTACGATAGAACGGGGTTTGCTTGGAGCATTGGCATATATCCACGAAaAGGGTATAATACATCGTG 

-»,/ui f.ocpehvvlgofgaackldehtoicpkcygus 
M\ IIatatatttttggataaacctwaaatgtagtattgggggactttggggcagcatgtaaattagatgaacatacagataaacccaaatgttatggatgga 

,/o riiPTUSPELLALDPYCTKTOIWSAGLVLFEHS 

48m gtLIcictggaJccaattcgcctgaactgcttgcacttgatccatactgtacaaaaactgatatatggagtgcaggatta^ 

,o, vKoiTFFGKQVWGSGSOLRSIIRCLOVKPLEFP 

490! mujjIm 

\\L OUMSTNLCKHF KOYAIOLRHPYA I P 0 I 1RKSGHT 

5001 JagLaatktIcaaacttatgcamcacttcaagcagtacgcgattcagttacgacat 

XLR MDLEYAl AKKLTFDOEFRPSAQO I l H l P I FTKE 

5101 cgatggatcttgaatatgctattgca^^ 
sfo\ acccgctLcgcattatacIcgataIctSccgct 

QOPFO 1 K A P S G P T P Y S H R P 0 ! K 

5301 GCAATAATGACAACATTCGAAGTCTTTGAAGATTCGCAGACCTTTTTTGCGAATGGCACCTTCGGC^ACCTACGCCATATTCCCACAGACCGCAAATAAAG 

i7uvrTFSDCKRYTLH0ESKVDORCSOl HNSLAOSH 
5401 CATTATGGAACATTTTCGGATTGCATGAGATATACTCTAAACCATGAGAGTAAGGTAGATGATAGATGTTCAGACATACATAACTCCTTAGCACAA 

vTssHSVKMOSEECPLlNGPSMOAEOPKSVFYC 
5501 ATGTTACTTCAAGCATGTCTGTAATGAACGATTCGGAAGAATGTCCATTAATAAATGGACCTTCGATGCAGGCAGAGGACCCTAAAAGTGTTTTTTATAA 

Ri vBrPDRSRDFSWOMLMSHGNSGLRREKYl RSSK 
5601 AGTTCGTAAGCCTGACCGAAGTCGTGATTTTfCATGGCAAAATCTGAACTCCCATGGCAATAGTGGTCTACGTCGTGAAAAATAj^ 
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Fig. 2 Continued 


(gi) 


117RRWKMPEIFICVStKCESI CACMClKISFSFF«U7 
5701 ACCCCATCGAXGAATCCCGAGA TATTTAA GGTATCTTTGikAATGTCAATC^ATTCCCGCTCCTAACGG AATAAAAAT TTCATTCTCATTTTTCTAACATT 

5801 ATAATATATCAGATCGTTTCTTATATACTTATTTTCATCGTCGCGATATGACTAACGTATACTAAGTTACAAGAAACAACTGCTTAACGTCGAACATAAC 


(JS6 1 MNRYRYES1FFRY 

(gD) 5901 GG AAATAAAAATATATATAG CGTCTCC TATAACT GTTATATTGGCACCTTTTAGAGCTTCGGTATGAATAGATACAGATATGAAAGTATT7TTTTTAGAT 


14 ISSTRH1L11CLLLCT60HSAMGLKKDHSPI IP 

6001 AT AT CT CAT CCACGA GA ATGAT TCT T AT AAT C TGTTTACTTT TGGGAACT GGGGACAT GT CCGCAA T GGGACT T AAGAAAG AC AA T T C7 CCGAT CA T T CC 

47 TlHPICGNEHLRATlNEYICtPSPLFDTlDMSYET 

6101 CACATTACATCCGAAAGGTAATGAAAACCTCCGGGCTACTCTCAATGAATACAAAATCCCGTCTCCACTGTTTGATACACTTGACAATTCATATGAGACA 

80KHV]YTDN"SFAVLNPFGOPKYTLLSLLLHCRRK 

6201 AAACACGTAATATATACGGATAATTGTAGTTTTGCTGTTTTGAATCCATTTGGCGATCCGAAATATACCCTTCTCAGTTTACTGTTGATGGGACGACGCA 

1H ydalvaufvigracgrpiyireyam~c~stnepfg 

6301 aatatgatgctctagtagcatggtttgtcttgggcawgcatgtgggagaccaatttatttacgtgaatatgccaactgctctactaatgaaccatttcg 

147 tcklkslgwworryahtsyi droelkl i i a a p s 

6<oi aacttgtaaattaaagtccctaggatggtgcgatagaagatatgcmtgacgagttatatcgatcgagatgaattgaaattgattattgcagcacccagt 

180 relsglytrli ! ihgepissd 1 lttvkctcsfsr 

6501 cgtgagctaagtcgattatatacccgtttaataattattaatcgagaacccatttccagtgacatattactgactgttaaaggaacatgtagtttttcga 

2K rgi kdwkickpfsffvn~g~ttrlldhvrtgtpra 

6601 gacgggggataaaggataacaaactatgcuuiaccgttcagtttttttgtcaatggtacaacacggctgttagacatggtgcgaacaggaaccccgagagc 

247 heekvkqulerhggchtpivvetshoovshlpr 

6701 ccatgaagaaaatgtgaagcagtggcttgaacgaaatcgtggtaaacatctaccaatcgtcgtcgaaacatctatgcaacaagtctcaaatttcccgaga 

280 sfrdsylkspdddkyndvkhtsattnk t 7 s V o c 

6801 agttttawgattcatatttaaaatcacctgacwcc^taaatataatgacgtcaaaatgacatcgcccactactaataacattaccacctccgtcgatg 

3K YTGLTMRPEOFEKAPYI T KRP I r sveeassqsp 

6901 gttacactggactcactaatcggcccgaggactttgagaaagcaccatacataactaaacgaccgataatctctgtcgaggaggcatccagtcaatcacc 


347 KI STEKKSRTOI ! 1SLV VL C VHFCF I VI GSG IU 

7001 taaaatatcaacagaaaaaaaatcccgaacgcaaataataatttcactagttgttctatgcgtcatgttttgtttcattgtaatcgggtctggtatatgc 

380 1 i rkkrktvkydrrrpsrraysr l * 403 

7101 atccttcgcaaacaccgcaaaacggtgatgtatgatagacgtcgtccatcaagacgggcatattcccgcctataacacgtgtttggtatgggcgtgtcgc - 


US7 1 H Y V L 0 L L 


7201 tatagtgcataagaagttgactacattgatcaatcacat tatatag cttctttggtcagatagacggcgtgtctgattgcgatgtatgtactacaattat 


8 fwirlfrgiusivttgtsvtlstdosalvafrg 

7301 tattttggatccgcctctttcgaggcatctggtctatagtttatactgcaacatctgttacgttatcaaccgaccaatctgctcttgttgcgttccgcgg 

41 idkkvmvrgollfigootrtssytgt te i t k u d 

7401 attagataaaatggtgaatgtacgcggccaacttttattcctgggcgaccagactcggaccagttcttatacaggaacgacgcaaatcttgaaatggcat 

74eeykcysvlhatsymdcpa1datvfrgcrdavvy 

7501 gaagaatataaatgctattccgttctacatgcgacatca7atatggattgtcctcctatagacgccacgctattcagaggctgtagacaccctgtggtat 

108 aqphgrvqpfpekgtllr i veprvsdtgsyy ir 

7601 atgctcaacctcatggtagagtacaaccttttcccgaaftagggaacattgttgagaattgtcgaacccacagtatcagatacaggcagctattacatacg 

141 vslagrh~k"sdifrkvvi irssksuachVsassf 

7701 tgta1ctctcgctggaagaaatatgagcgatatatttagaatggttgttattataaggagtagcaaatcttgggcctgtaatcactctgctagttcattt 

174oahkciryv0rhafekylighvgkllds0selha 

7801 caggcccataaatgtattcgc7atgtcgaccgtatggcctttgaaaattatctgat tggacatgtaggcaatttgctggacagtgactcggaattgcatg 

208 1 Y N YpQSISTOIHIVTTPFYOHSGT I YSPTVF 

7901 CAATTTATAATATTACTCCCCAATCCATTTCCACAGATATTAATATTGTAACCACTCCATTTTACGATAATTCGGGAACAATTTATTCACCTACCGTTTT 


241 HLFKKKSHVOAKKSTGKUHTVt KY TLPRL IYFS 
8001 TAATTTGTTTAATAACAATTCCCATGTCGA7GCAATGAATTCGACTGGTATGTGGAATACCGTTTTAAAATATACCCTTCCAAGGCTTATTTACTTTTCT 


274 TMIVLCIlAtAIYLVCERCRSPHRR!Y1CEPRSD 

8101 ACGATGATTGTACTATGTATAATAGCATTGGCAATTTATTTGCTCTGTGAAAGGTGCCCCTCTCCCCA7CGTAGGATATACATCGGTGAACCAAGATC7C 

308 EAPL ITSAVWESFOYOYMVKE TPSOVI EKELKE 

8201 ATGAGGCCCCACTCATCACTTCTGCAGT TAACGAATCATTTCAATATGATTA1AATGTAAAGGAAACTCCTTCAGATGTTATTGAAAAGGAGTTCATGGA 

341 KIKKICVELIEREECV'355 

8301 AAAACTGAAGAACAAAG7CGAATTGTTGGAAAGAGAAGAA7GTGTATAGGTTTGAGAAAC7A7TA1ACGTACGTGGTACCTGTTAGCT1AGTATAAGGGG 
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Fig. 2 Continued 

US8 .1 h c v f o 

(gE» 8401 ACGACCCCTTTCTTCTTTTAAACACACGAACACAACGCCGTAAGTniH^TCAATTTTCTCCATCTCTCCGAGTCAGCGTCATAATCTGTCTTTTCC 

6 ILIIVTT1CVAGT AHINH10VPAGKSATTT1PR 

8501 AAATCCTGATAATAGTGACGACGATCAAAGTAGCTGGAACGGCCAACATAAATCATATAGACGTTCCTGCAGGACATTCTGCTACAACGACGATCCCGCG 

39 YPPVVDCTLYTETUTV!PMHCH~E~TATGYVCLES 

8601 ATATCCACCAGTTGTCGATGGGACCCTTTACACCGAGACGTGGACATGGATTCCCAATCACTGCAACGAAACGGCAACAGGCTATGTATGTCTGGAAAGT 

72AHCFT0L I LGVSCHRYAOE 1 V L R T D K F 1 V 0 A 6 s" I 

8701 GCTCACTGTTTTACCGATTTGATATTAGGAGTATCCTCCATGAGGTATGCGGATGAAATCGTCTTACGAACTGATAAATTTATTGTCGATGCGGGATCCA 

106 koieslslngvphiflstkasnkleilmVsloii 

8801 T T AAACAAAT AGAAT CGC T AAG T CT GAAT GG AG T T CCGAAT AT AT TCCTAT CT ACGAAAGCA AG T AAC AAGT T GGAG AT ACT AAAT GC T AGCC TACAAAA 

139 AGIYIRYSRN ~G~ TRTACLDVVVVGVI.GOARDRLP 

8901 TGCGGGTATCTACATTCGGTATTCTAGAAATGGGACGAGGACTGCAAAGCTGGATGTTGTTGTGGTTGGCGTTTTGGGTCAAGCAAGGGATCGCCTACCC 

172QHSSPMISSHAOIICLSLICNFk:ALVYHVGOTlM"v"s 

9001 CAAATGTCCAGTCCTATGATCTCATCCCACGCCGATATCAAGTTGTCATTAAAAAACTTTAAAGCATTAGTATATCACGTGGGAGATACTATCAATGTCT 

206 tavilgpspeiftlefrvlflryn"p"tcicfvtiy 

9101 cgacgccgcttatactaggaccttctccggagatattcacattggaatttagggtgttgttcctccgttataatccaacgtgcaagttcgtcacgattta 

239 epcifhpkepecittaeosvchfasmioiloia 

9201 tgaaccttctatatttcaccccaaagaaccacagtctattactactgcagaacaatcggtatgtcatttcccatccaacattgacattctgcagataccc 

272 aarsem"c"stgyrrciyotaidesvoarltfiepg 

9301 gccgcacgttctgaaaattgtagcacagggtatcgtagatgtatttatgacacggctatcgatgaatctgtgcaggccagattaacattcatagaaccag 

306 IPSFKKKDVOVOOAGL.Y yvVALYKGRPSAUTYI 

9401 gaattccttcctttaaaatgaaagatgtccaggtagacgatgctggattgtatgtggttgtgcctttatacaatggacgtccaagtgcatcgacttacat 

339 ylstvetylmvtehyh,k.pgfcyksflow"s"sivo 

9501 ttatttctcaacggtggaaacatatcttaatgtatatgaaaactaccacaagccgggatttgggtataaatcatttctacag^ 

372 E N E A S 0 U S S S S I K R R N M "c" T I I YD f L L T S L S I G A '\ 

9601 GAAAATGAGGCTAGCGATTGGTCCAGCTCCTCCATTAAACCGAGAAATAATGGTACTATCATTTATGATATTTTACTCACATCGCTATCAATTGGGGCGA 


456 1 I V I VGGVC1AI LI RR^R ,R .R RRTRGLFOEYPtCYK 

9701 TTATTATCCTCATAGTAGGGGCTGTTTGTATTGCCATATTAATTAGGCGTAGGAGACGACGTCGCACGACGGGGTTATTCGAT6AATATCCCAAATATAT 

439 TLPGMOLGGMNVPYOMTCSGMOVEYYOECSAICH 

9801 CACGCTACCAGGAAACGATCTGGGGGGCATGAATGTACCGTATGATAATACATGCTCTGGTAACCAAGTTGAATATTATCAACAAAAGTCGGCTAAAATG 

472 CRHGSGYTAVLCMOHPKI RKRLOlYK* 497 

9901 AAAAGAATCGGTTCGGCTTATACCGCTTGGCTAAAAAATCATATGCCGAAAATTAGGAAACGCTTAGATTTATACCACTGATATGTACATATTTAAACTT 

10001 AATGGGATATAGTATATGGACCTrrATATGACGAGAGTAAATAAACTGACAATGCAAATGAAGCTGATCTATATTGTGCTTTATATTGGGACAAACCACT 

10101 CGCACAAGCTCATICAACACATCCACTCTTGCTAT1AAATTCCCCATTATATAACAATACTGACATAACACTCATATTAAGGGGAGAAAATAAATATGCA 

10201 TGGCCGATCATATTTTATTGAGATCCGAAAATATATCATGCAAATAAGCATGTTCTAGCACCACTGCAACATGTGGTTTATCGATTTCCGGAAAGAATAG 

10301 TTCAACCATTCCCTCCGAGCAGTTGGCGATCCGTTGACCTGCAGGTCGAC 10350 
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Fig. 3A A. Multiple alignments displaying regions of maximum amino acid 
conservation. 
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Fig. 3B B. Dot matrix analyses depicting overall homologies. 
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RS- 5 

MOLECULAR CLONING OF A CONSTRUCT CONTAINING THE ONA ENCODING MDV gl AND PART OF MOV gE 
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